gptkbp:instanceOf
|
gptkb:model
image classification model
|
gptkbp:application
|
image classification
image segmentation
object detection
|
gptkbp:architecture
|
gptkb:transformation
|
gptkbp:author
|
gptkb:Alexey_Dosovitskiy
|
gptkbp:citation
|
high
|
gptkbp:developedBy
|
gptkb:Google_Research
|
gptkbp:hasVariant
|
gptkb:DeiT
gptkb:Swin_Transformer
gptkb:ViT-B
gptkb:ViT-H
gptkb:ViT-L
Data-efficient Image Transformer (DeiT)
|
https://www.w3.org/2000/01/rdf-schema#label
|
Vision Transformer (ViT)
|
gptkbp:improves
|
convolutional neural networks (CNNs) on large datasets
|
gptkbp:influencedBy
|
gptkb:MAE
gptkb:Segmenter
gptkb:CLIP
gptkb:DINO
gptkb:Swin_Transformer
gptkb:BEiT
gptkb:SAM
vision-language models
multimodal transformers
ViT-GPT2
Masked Autoencoders
ViT-Adapter
ViT-Adapter-B
ViT-Adapter-Base
ViT-Adapter-Giant
ViT-Adapter-H
ViT-Adapter-Huge
ViT-Adapter-L
ViT-Adapter-Large
ViT-Adapter-Mega
ViT-Adapter-Super
ViT-Adapter-Ultra
ViT-Adapter-XL
ViT-Adapter-XXL
ViT-VQGAN
ViTAdapter
ViTDet
|
gptkbp:input
|
gptkb:illustrator
|
gptkbp:inspiredBy
|
transformer architecture
|
gptkbp:introducedIn
|
2020
|
gptkbp:notablePublication
|
gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
|
gptkbp:openSource
|
gptkb:TensorFlow
gptkb:PyTorch
|
gptkbp:size
|
16x16 pixels
|
gptkbp:splitsInputInto
|
image patches
|
gptkbp:trainer
|
gptkb:JFT-300M
gptkb:ImageNet
|
gptkbp:uses
|
self-attention mechanism
|
gptkbp:bfsParent
|
gptkb:EfficientNetV2
gptkb:OpenAI_CLIP
gptkb:Lucas_Beyer
gptkb:Neil_Houlsby
|
gptkbp:bfsLayer
|
7
|