Attention Heads

GPTKB entity

Statements (23)
Predicate Object
gptkbp:instanceOf Neural Network Component
gptkbp:aggregatesFrom Context Vectors
gptkbp:allows Model to focus on different positions
gptkbp:canBe Cross-Attention Heads
Self-Attention Heads
gptkbp:enables Multi-head Attention
Parallelization in Attention Computation
gptkbp:function Attend to different parts of input sequence
https://www.w3.org/2000/01/rdf-schema#label Attention Heads
gptkbp:improves Model Expressiveness
gptkbp:introducedIn Vaswani et al. 2017
gptkbp:notableFauna Hyperparameter
gptkbp:output Concatenated Attention Results
gptkbp:parameter Value
Key
Query
gptkbp:usedIn gptkb:BERT
gptkb:GPT
gptkb:Vision_Transformer
Transformer Model
gptkbp:visualizes Attention Maps
gptkbp:bfsParent gptkb:Transformer_Circuits
gptkbp:bfsLayer 8