gptkbp:instanceOf
|
Vision Transformer model
|
gptkbp:activatedBy
|
gptkb:GELU
|
gptkbp:architecture
|
gptkb:transformation
|
gptkbp:attentionHeads
|
16
|
gptkbp:attentionMechanism
|
self-attention
|
gptkbp:author
|
gptkb:Alexey_Dosovitskiy
gptkb:Alexander_Kolesnikov
gptkb:Lucas_Beyer
|
gptkbp:benchmarkPerformance
|
state-of-the-art on ImageNet (2020)
|
gptkbp:citation
|
high
|
gptkbp:developedBy
|
gptkb:Google_Research
|
gptkbp:frameworkSupport
|
gptkb:TensorFlow
gptkb:PyTorch
|
gptkbp:hasCLS
|
true
|
gptkbp:hasLayerCount
|
24
|
gptkbp:hasVariant
|
gptkb:ViT-B
gptkb:ViT-G
gptkb:ViT-H
|
gptkbp:hiddenSize
|
1024
|
https://www.w3.org/2000/01/rdf-schema#label
|
ViT-L
|
gptkbp:input
|
gptkb:illustrator
|
gptkbp:inputPatchSize
|
16x16
|
gptkbp:introducedIn
|
2020
|
gptkbp:mlpSize
|
4096
|
gptkbp:nativeToken
|
image patch
|
gptkbp:normalization
|
gptkb:LayerNorm
|
gptkbp:notableFor
|
scaling vision transformers to large datasets
|
gptkbp:notablePublication
|
gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
|
gptkbp:openSource
|
gptkb:Hugging_Face_Transformers
timm
|
gptkbp:output
|
class label
|
gptkbp:parameter
|
307M
|
gptkbp:positionEmbedding
|
learned
|
gptkbp:relatedTo
|
gptkb:BERT
Transformer (NLP)
|
gptkbp:trainer
|
gptkb:JFT-300M
ImageNet-21k
|
gptkbp:usedFor
|
image classification
|
gptkbp:usedIn
|
feature extraction
transfer learning
fine-tuning
|
gptkbp:bfsParent
|
gptkb:Vision_Transformer
|
gptkbp:bfsLayer
|
6
|