Transformer models

GPTKB entity

Statements (196)
Predicate Object
gptkbp:instanceOf machine learning model architecture
gptkbp:advantage parallelization
long-range dependency modeling
gptkbp:architecture deep learning
gptkbp:attentionMechanism self-attention
multi-head attention
gptkbp:component residual connections
feed-forward neural network
layer normalization
multi-head attention
positional encoding
self-attention mechanism
gptkbp:developedBy gptkb:Vaswani_et_al.
https://www.w3.org/2000/01/rdf-schema#label Transformer models
gptkbp:input sequence data
gptkbp:inspiredBy gptkb:Dolly
gptkb:BART
gptkb:Phoenix
gptkb:bird
gptkb:Electra
gptkb:CodeLlama
gptkb:WizardLM
gptkb:Yi-1.5B
gptkb:Yi-100B
gptkb:Yi-10B
gptkb:Yi-12B
gptkb:Yi-13B
gptkb:Yi-14B
gptkb:Yi-15B
gptkb:Yi-16B
gptkb:Yi-17B
gptkb:Yi-18B
gptkb:Yi-19B
gptkb:Yi-20B
gptkb:Yi-21B
gptkb:Yi-22B
gptkb:Yi-23B
gptkb:Yi-24B
gptkb:Yi-25B
gptkb:Yi-26B
gptkb:Yi-27B
gptkb:Yi-28B
gptkb:Yi-29B
gptkb:Yi-2B
gptkb:Yi-30B
gptkb:Yi-31B
gptkb:Yi-32B
gptkb:Yi-33B
gptkb:Yi-34B
gptkb:Yi-35B
gptkb:Yi-36B
gptkb:Yi-37B
gptkb:Yi-38B
gptkb:Yi-39B
gptkb:Yi-3B
gptkb:Yi-40B
gptkb:Yi-41B
gptkb:Yi-42B
gptkb:Yi-43B
gptkb:Yi-44B
gptkb:Yi-45B
gptkb:Yi-46B
gptkb:Yi-47B
gptkb:Yi-48B
gptkb:Yi-49B
gptkb:Yi-4B
gptkb:Yi-50B
gptkb:Yi-51B
gptkb:Yi-52B
gptkb:Yi-53B
gptkb:Yi-54B
gptkb:Yi-55B
gptkb:Yi-56B
gptkb:Yi-57B
gptkb:Yi-58B
gptkb:Yi-59B
gptkb:Yi-5B
gptkb:Yi-60B
gptkb:Yi-61B
gptkb:Yi-62B
gptkb:Yi-63B
gptkb:Yi-64B
gptkb:Yi-65B
gptkb:Yi-66B
gptkb:Yi-67B
gptkb:Yi-68B
gptkb:Yi-69B
gptkb:Yi-6B
gptkb:Yi-70B
gptkb:Yi-71B
gptkb:Yi-72B
gptkb:Yi-73B
gptkb:Yi-74B
gptkb:Yi-75B
gptkb:Yi-76B
gptkb:Yi-77B
gptkb:Yi-78B
gptkb:Yi-79B
gptkb:Yi-7B
gptkb:Yi-80B
gptkb:Yi-81B
gptkb:Yi-82B
gptkb:Yi-83B
gptkb:Yi-84B
gptkb:Yi-85B
gptkb:Yi-86B
gptkb:Yi-87B
gptkb:Yi-88B
gptkb:Yi-89B
gptkb:Yi-8B
gptkb:Yi-90B
gptkb:Yi-91B
gptkb:Yi-92B
gptkb:Yi-93B
gptkb:Yi-94B
gptkb:Yi-95B
gptkb:Yi-96B
gptkb:Yi-97B
gptkb:Yi-98B
gptkb:Yi-99B
gptkb:Yi-9B
gptkb:Pegasus
gptkb:T5
gptkb:Yi
gptkb:mBERT
gptkb:Gemini
gptkb:Switch_Transformer
gptkb:XLM
gptkb:XLM-R
gptkb:ChatGPT
gptkb:GPT-2
gptkb:GPT-3
gptkb:GPT-4
gptkb:StarCoder
gptkb:Claude
gptkb:LLaMA
gptkb:Mistral
gptkb:Vicuna
gptkb:ERNIE
gptkb:LaMDA
gptkb:PaLM
gptkb:BERT
gptkb:BLOOM
gptkb:MPT
gptkb:OPT
gptkb:RWKV
gptkb:StableLM
gptkb:Phi
gptkb:ALBERT
gptkb:Baichuan
gptkb:BigBird
gptkb:DeBERTa
gptkb:DeepSeek
gptkb:DistilBERT
gptkb:GPT
gptkb:InternLM
gptkb:Longformer
gptkb:Mixtral
gptkb:OpenLLaMA
gptkb:Qwen
gptkb:RoBERTa
gptkb:Transformer-XL
gptkb:Vision_Transformer
gptkb:XLNet
gptkb:Zephyr
gptkb:CodeGen
Alpaca
Reformer
gptkbp:introduced gptkb:Attention_Is_All_You_Need
gptkbp:introducedIn 2017
gptkbp:limitation quadratic memory complexity
scaling to long sequences
gptkbp:openSource gptkb:MarianNMT
gptkb:OpenNMT
gptkb:T5X
gptkb:TensorFlow
gptkb:DeepSpeed
gptkb:Fairseq
gptkb:Megatron-LM
gptkb:JAX
gptkb:PyTorch
gptkb:Hugging_Face_Transformers
gptkbp:replacedBy gptkb:GRUs
gptkb:LSTMs
gptkb:RNNs
gptkbp:usedFor machine translation
natural language processing
image processing
question answering
text generation
text classification
gptkbp:variant encoder-decoder
decoder-only
encoder-only
gptkbp:bfsParent gptkb:Speech_Recognition
gptkbp:bfsLayer 6