Vision Transformers

GPTKB entity

Statements (33)
Predicate Object
gptkbp:instanceOf gptkb:model
gptkb:convolutional_neural_network
gptkbp:appliesTo computer vision
gptkbp:basedOn transformer architecture
gptkbp:hasVariant gptkb:DeiT
gptkb:Pyramid_Vision_Transformer
gptkb:Swin_Transformer
https://www.w3.org/2000/01/rdf-schema#label Vision Transformers
gptkbp:improves convolutional neural networks (on large datasets)
gptkbp:influenced vision-language models
multimodal transformers
gptkbp:input image patches
gptkbp:introduced gptkb:Alexey_Dosovitskiy
gptkbp:introducedIn 2020
gptkbp:limitation computationally expensive
less effective on small datasets
requires large memory
gptkbp:notablePublication gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
gptkbp:openSource gptkb:TensorFlow
gptkb:PyTorch
gptkbp:publishedIn gptkb:arXiv
gptkbp:requires large-scale datasets
gptkbp:trainer gptkb:Adam_optimizer
gptkb:ImageNet
data augmentation
gptkbp:usedFor image classification
image segmentation
object detection
gptkbp:uses self-attention mechanism
MLP head for classification
position embeddings
gptkbp:bfsParent gptkb:Self-supervised_Learning
gptkbp:bfsLayer 7