Vision Transformer

URI: https://gptkb.org/entity/Vision_Transformer

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:model gptkb:convolutional_neural_network
gptkbp:abbreviation	gptkb:ViT
gptkbp:appliesTo	image classification
gptkbp:arXivID	2010.11929
gptkbp:basedOn	transformer architecture
gptkbp:citation	over 10,000 (as of 2024)
gptkbp:developedBy	gptkb:Google_Research
gptkbp:hasVariant	gptkb:Compact_Vision_Transformer gptkb:CvT gptkb:Data-efficient_Image_Transformer gptkb:DeiT gptkb:MobileViT gptkb:Pyramid_Vision_Transformer gptkb:Swin_Transformer gptkb:T2T-ViT gptkb:ViT-B gptkb:ViT-G gptkb:ViT-H gptkb:ViT-L gptkb:ViT-M gptkb:ViT-S gptkb:ViT-T
gptkbp:improves	convolutional neural networks (on large datasets)
gptkbp:input	image patches
gptkbp:inspiredBy	subsequent vision transformer models
gptkbp:introduced	gptkb:Alexey_Dosovitskiy gptkb:Jakob_Uszkoreit gptkb:Alexander_Kolesnikov gptkb:Dirk_Weissenborn gptkb:Georg_Heigold gptkb:Lucas_Beyer gptkb:Matthias_Minderer gptkb:Mostafa_Dehghani gptkb:Neil_Houlsby gptkb:Sylvain_Gelly gptkb:Thomas_Unterthiner gptkb:Xiaohua_Zhai
gptkbp:introducedIn	2020
gptkbp:openSource	gptkb:TensorFlow gptkb:PyTorch
gptkbp:publishedIn	gptkb:arXiv
gptkbp:trainer	gptkb:JFT-300M gptkb:ImageNet
gptkbp:usedIn	medical imaging object detection semantic segmentation video understanding
gptkbp:uses	self-attention mechanism
gptkbp:bfsParent	gptkb:Transformer_models gptkb:TorchVision gptkb:Hugging_Face_Transformers
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Vision Transformer