Vision Transformer (ViT)

URI: https://gptkb.org/entity/Vision_Transformer_(ViT)

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:model gptkb:image_classification_model
gptkbp:application	image classification image segmentation object detection
gptkbp:architecture	gptkb:transformation
gptkbp:author	gptkb:Alexey_Dosovitskiy
gptkbp:citation	high
gptkbp:developedBy	gptkb:Google_Research
gptkbp:hasVariant	gptkb:DeiT gptkb:Swin_Transformer gptkb:ViT-B gptkb:ViT-H gptkb:ViT-L Data-efficient Image Transformer (DeiT)
gptkbp:improves	convolutional neural networks (CNNs) on large datasets
gptkbp:influencedBy	gptkb:MAE gptkb:Segmenter gptkb:CLIP gptkb:DINO gptkb:Swin_Transformer gptkb:BEiT gptkb:SAM vision-language models multimodal transformers ViT-GPT2 Masked Autoencoders ViT-Adapter ViT-Adapter-B ViT-Adapter-Base ViT-Adapter-Giant ViT-Adapter-H ViT-Adapter-Huge ViT-Adapter-L ViT-Adapter-Large ViT-Adapter-Mega ViT-Adapter-Super ViT-Adapter-Ultra ViT-Adapter-XL ViT-Adapter-XXL ViT-VQGAN ViTAdapter ViTDet
gptkbp:input	gptkb:illustrator
gptkbp:inspiredBy	transformer architecture
gptkbp:introducedIn	2020
gptkbp:notablePublication	gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
gptkbp:openSource	gptkb:TensorFlow gptkb:PyTorch
gptkbp:size	16x16 pixels
gptkbp:splitsInputInto	image patches
gptkbp:trainer	gptkb:JFT-300M gptkb:ImageNet
gptkbp:uses	self-attention mechanism
gptkbp:bfsParent	gptkb:OpenAI_CLIP gptkb:VideoMAE gptkb:Vision_Transformer_(ViT):_An_Image_is_Worth_16x16_Words
gptkbp:bfsLayer	8
https://www.w3.org/2000/01/rdf-schema#label	Vision Transformer (ViT)