Vision Transformers

URI: https://gptkb.org/entity/Vision_Transformers

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:model gptkb:convolutional_neural_network
gptkbp:appliesTo	computer vision
gptkbp:basedOn	transformer architecture
gptkbp:hasVariant	gptkb:DeiT gptkb:Pyramid_Vision_Transformer gptkb:Swin_Transformer
gptkbp:improves	convolutional neural networks (on large datasets)
gptkbp:influenced	vision-language models multimodal transformers
gptkbp:input	image patches
gptkbp:introduced	gptkb:Alexey_Dosovitskiy
gptkbp:introducedIn	2020
gptkbp:limitation	computationally expensive less effective on small datasets requires large memory
gptkbp:notablePublication	gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
gptkbp:openSource	gptkb:TensorFlow gptkb:PyTorch
gptkbp:publishedIn	gptkb:arXiv
gptkbp:requires	large-scale datasets
gptkbp:trainer	gptkb:Adam_optimizer gptkb:ImageNet data augmentation
gptkbp:usedFor	image classification image segmentation object detection
gptkbp:uses	self-attention mechanism MLP head for classification position embeddings
gptkbp:bfsParent	gptkb:ConvNeXt:_Revisiting_ConvNets_for_Image_Recognition
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	Vision Transformers