Self-Attention Mechanism

GPTKB entity

Statements (43)
Predicate Object
gptkbp:instanceOf Neural Network Component
gptkbp:addressedTo Efficient Attention Variants
Sparse Attention
gptkbp:appliesTo gptkb:Computer_Vision
gptkb:Natural_Language_Processing
gptkb:Speech_Recognition
Time Series Analysis
gptkbp:complexity O(n^2) with respect to sequence length
gptkbp:component Multi-Head Attention
gptkbp:computes Attention scores
gptkbp:coreOperation Weighted sum of values
gptkbp:enables Parallel computation
Bidirectional context modeling
Contextual representation
Long-range dependency modeling
Self-contextualization of tokens
https://www.w3.org/2000/01/rdf-schema#label Self-Attention Mechanism
gptkbp:influenced gptkb:Longformer
gptkb:Vision_Transformer
actor
Reformer
Linformer
gptkbp:input Sequence of vectors
gptkbp:introduced gptkb:Ashish_Vaswani
gptkbp:introducedIn gptkb:Attention_Is_All_You_Need
2017
gptkbp:limitation Quadratic memory usage
Scalability to long sequences
gptkbp:output Sequence of vectors
gptkbp:purpose Capture dependencies between input tokens
gptkbp:relatedTo Cross-Attention
Scaled Dot-Product Attention
gptkbp:replacedBy Recurrent Neural Networks in NLP
gptkbp:requires Positional Encoding
gptkbp:usedIn gptkb:BERT
gptkb:GPT
Transformer Model
gptkbp:uses Value
Key
Query
gptkbp:variant Attention Mechanism
gptkbp:bfsParent gptkb:Large_Language_Models
gptkbp:bfsLayer 5