Statements (43)
Predicate | Object |
---|---|
gptkbp:instanceOf |
Neural Network Component
|
gptkbp:addressedTo |
Efficient Attention Variants
Sparse Attention |
gptkbp:appliesTo |
gptkb:Computer_Vision
gptkb:Natural_Language_Processing gptkb:Speech_Recognition Time Series Analysis |
gptkbp:complexity |
O(n^2) with respect to sequence length
|
gptkbp:component |
Multi-Head Attention
|
gptkbp:computes |
Attention scores
|
gptkbp:coreOperation |
Weighted sum of values
|
gptkbp:enables |
Parallel computation
Bidirectional context modeling Contextual representation Long-range dependency modeling Self-contextualization of tokens |
https://www.w3.org/2000/01/rdf-schema#label |
Self-Attention Mechanism
|
gptkbp:influenced |
gptkb:Longformer
gptkb:Vision_Transformer actor Reformer Linformer |
gptkbp:input |
Sequence of vectors
|
gptkbp:introduced |
gptkb:Ashish_Vaswani
|
gptkbp:introducedIn |
gptkb:Attention_Is_All_You_Need
2017 |
gptkbp:limitation |
Quadratic memory usage
Scalability to long sequences |
gptkbp:output |
Sequence of vectors
|
gptkbp:purpose |
Capture dependencies between input tokens
|
gptkbp:relatedTo |
Cross-Attention
Scaled Dot-Product Attention |
gptkbp:replacedBy |
Recurrent Neural Networks in NLP
|
gptkbp:requires |
Positional Encoding
|
gptkbp:usedIn |
gptkb:BERT
gptkb:GPT Transformer Model |
gptkbp:uses |
Value
Key Query |
gptkbp:variant |
Attention Mechanism
|
gptkbp:bfsParent |
gptkb:Large_Language_Models
|
gptkbp:bfsLayer |
5
|