Transformer models

URI: https://gptkb.org/entity/Transformer_models

GPTKB entity

Statements (196)

Predicate	Object
gptkbp:instanceOf	gptkb:machine_learning_model_architecture
gptkbp:advantage	parallelization long-range dependency modeling
gptkbp:architecture	deep learning
gptkbp:attentionMechanism	self-attention multi-head attention
gptkbp:component	residual connections feed-forward neural network layer normalization multi-head attention positional encoding self-attention mechanism
gptkbp:developedBy	gptkb:Vaswani_et_al.
gptkbp:input	sequence data
gptkbp:inspiredBy	gptkb:Dolly gptkb:BART gptkb:Phoenix gptkb:bird gptkb:Electra gptkb:CodeLlama gptkb:WizardLM gptkb:Yi-1.5B gptkb:Yi-100B gptkb:Yi-10B gptkb:Yi-12B gptkb:Yi-13B gptkb:Yi-14B gptkb:Yi-15B gptkb:Yi-16B gptkb:Yi-17B gptkb:Yi-18B gptkb:Yi-19B gptkb:Yi-20B gptkb:Yi-21B gptkb:Yi-22B gptkb:Yi-23B gptkb:Yi-24B gptkb:Yi-25B gptkb:Yi-26B gptkb:Yi-27B gptkb:Yi-28B gptkb:Yi-29B gptkb:Yi-2B gptkb:Yi-30B gptkb:Yi-31B gptkb:Yi-32B gptkb:Yi-33B gptkb:Yi-34B gptkb:Yi-35B gptkb:Yi-36B gptkb:Yi-37B gptkb:Yi-38B gptkb:Yi-39B gptkb:Yi-3B gptkb:Yi-40B gptkb:Yi-41B gptkb:Yi-42B gptkb:Yi-43B gptkb:Yi-44B gptkb:Yi-45B gptkb:Yi-46B gptkb:Yi-47B gptkb:Yi-48B gptkb:Yi-49B gptkb:Yi-4B gptkb:Yi-50B gptkb:Yi-51B gptkb:Yi-52B gptkb:Yi-53B gptkb:Yi-54B gptkb:Yi-55B gptkb:Yi-56B gptkb:Yi-57B gptkb:Yi-58B gptkb:Yi-59B gptkb:Yi-5B gptkb:Yi-60B gptkb:Yi-61B gptkb:Yi-62B gptkb:Yi-63B gptkb:Yi-64B gptkb:Yi-65B gptkb:Yi-66B gptkb:Yi-67B gptkb:Yi-68B gptkb:Yi-69B gptkb:Yi-6B gptkb:Yi-70B gptkb:Yi-71B gptkb:Yi-72B gptkb:Yi-73B gptkb:Yi-74B gptkb:Yi-75B gptkb:Yi-76B gptkb:Yi-77B gptkb:Yi-78B gptkb:Yi-79B gptkb:Yi-7B gptkb:Yi-80B gptkb:Yi-81B gptkb:Yi-82B gptkb:Yi-83B gptkb:Yi-84B gptkb:Yi-85B gptkb:Yi-86B gptkb:Yi-87B gptkb:Yi-88B gptkb:Yi-89B gptkb:Yi-8B gptkb:Yi-90B gptkb:Yi-91B gptkb:Yi-92B gptkb:Yi-93B gptkb:Yi-94B gptkb:Yi-95B gptkb:Yi-96B gptkb:Yi-97B gptkb:Yi-98B gptkb:Yi-99B gptkb:Yi-9B gptkb:Pegasus gptkb:T5 gptkb:Yi gptkb:mBERT gptkb:Gemini gptkb:Switch_Transformer gptkb:XLM gptkb:XLM-R gptkb:ChatGPT gptkb:GPT-2 gptkb:GPT-3 gptkb:GPT-4 gptkb:StarCoder gptkb:Reformer gptkb:Claude gptkb:LLaMA gptkb:Mistral gptkb:Vicuna gptkb:ERNIE gptkb:LaMDA gptkb:PaLM gptkb:BERT gptkb:BLOOM gptkb:MPT gptkb:OPT gptkb:RWKV gptkb:StableLM gptkb:Phi gptkb:ALBERT gptkb:Baichuan gptkb:BigBird gptkb:DeBERTa gptkb:DeepSeek gptkb:DistilBERT gptkb:GPT gptkb:InternLM gptkb:Longformer gptkb:Mixtral gptkb:OpenLLaMA gptkb:Qwen gptkb:RoBERTa gptkb:Transformer-XL gptkb:Vision_Transformer gptkb:XLNet gptkb:Zephyr gptkb:CodeGen Alpaca
gptkbp:introduced	gptkb:Attention_Is_All_You_Need
gptkbp:introducedIn	2017
gptkbp:limitation	quadratic memory complexity scaling to long sequences
gptkbp:openSource	gptkb:MarianNMT gptkb:OpenNMT gptkb:T5X gptkb:TensorFlow gptkb:DeepSpeed gptkb:Fairseq gptkb:Megatron-LM gptkb:JAX gptkb:PyTorch gptkb:Hugging_Face_Transformers
gptkbp:replacedBy	gptkb:GRUs gptkb:LSTMs gptkb:RNNs
gptkbp:usedFor	machine translation natural language processing image processing question answering text generation text classification
gptkbp:variant	encoder-decoder decoder-only encoder-only
gptkbp:bfsParent	gptkb:Speech_Recognition
gptkbp:bfsLayer	6
http://www.w3.org/2000/01/rdf-schema#label	Transformer models