gptkbp:instanceOf
|
gptkb:model
gptkb:convolutional_neural_network
|
gptkbp:appliesTo
|
computer vision
|
gptkbp:basedOn
|
transformer architecture
|
gptkbp:hasVariant
|
gptkb:DeiT
gptkb:Pyramid_Vision_Transformer
gptkb:Swin_Transformer
|
https://www.w3.org/2000/01/rdf-schema#label
|
Vision Transformers
|
gptkbp:improves
|
convolutional neural networks (on large datasets)
|
gptkbp:influenced
|
vision-language models
multimodal transformers
|
gptkbp:input
|
image patches
|
gptkbp:introduced
|
gptkb:Alexey_Dosovitskiy
|
gptkbp:introducedIn
|
2020
|
gptkbp:limitation
|
computationally expensive
less effective on small datasets
requires large memory
|
gptkbp:notablePublication
|
gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
|
gptkbp:openSource
|
gptkb:TensorFlow
gptkb:PyTorch
|
gptkbp:publishedIn
|
gptkb:arXiv
|
gptkbp:requires
|
large-scale datasets
|
gptkbp:trainer
|
gptkb:Adam_optimizer
gptkb:ImageNet
data augmentation
|
gptkbp:usedFor
|
image classification
image segmentation
object detection
|
gptkbp:uses
|
self-attention mechanism
MLP head for classification
position embeddings
|
gptkbp:bfsParent
|
gptkb:Self-supervised_Learning
|
gptkbp:bfsLayer
|
7
|