Statements (43)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:Neural_Network_Component
|
| gptkbp:addressedTo |
Efficient Attention Variants
Sparse Attention |
| gptkbp:appliesTo |
gptkb:Computer_Vision
gptkb:Natural_Language_Processing gptkb:Speech_Recognition Time Series Analysis |
| gptkbp:complexity |
O(n^2) with respect to sequence length
|
| gptkbp:component |
Multi-Head Attention
|
| gptkbp:computes |
Attention scores
|
| gptkbp:coreOperation |
Weighted sum of values
|
| gptkbp:enables |
Parallel computation
Bidirectional context modeling Contextual representation Long-range dependency modeling Self-contextualization of tokens |
| gptkbp:influenced |
gptkb:actor
gptkb:Reformer gptkb:Longformer gptkb:Vision_Transformer Linformer |
| gptkbp:input |
Sequence of vectors
|
| gptkbp:introduced |
gptkb:Ashish_Vaswani
|
| gptkbp:introducedIn |
gptkb:Attention_Is_All_You_Need
2017 |
| gptkbp:limitation |
Quadratic memory usage
Scalability to long sequences |
| gptkbp:output |
Sequence of vectors
|
| gptkbp:purpose |
Capture dependencies between input tokens
|
| gptkbp:relatedTo |
Cross-Attention
Scaled Dot-Product Attention |
| gptkbp:replacedBy |
Recurrent Neural Networks in NLP
|
| gptkbp:requires |
Positional Encoding
|
| gptkbp:usedIn |
gptkb:BERT
gptkb:GPT Transformer Model |
| gptkbp:uses |
Value
Key Query |
| gptkbp:variant |
Attention Mechanism
|
| gptkbp:bfsParent |
gptkb:Large_Language_Models
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Self-Attention Mechanism
|