Vision Transformer (ViT)

GPTKB entity

Statements (60)
Predicate Object
gptkbp:instanceOf gptkb:model
image classification model
gptkbp:application image classification
image segmentation
object detection
gptkbp:architecture gptkb:transformation
gptkbp:author gptkb:Alexey_Dosovitskiy
gptkbp:citation high
gptkbp:developedBy gptkb:Google_Research
gptkbp:hasVariant gptkb:DeiT
gptkb:Swin_Transformer
gptkb:ViT-B
gptkb:ViT-H
gptkb:ViT-L
Data-efficient Image Transformer (DeiT)
https://www.w3.org/2000/01/rdf-schema#label Vision Transformer (ViT)
gptkbp:improves convolutional neural networks (CNNs) on large datasets
gptkbp:influencedBy gptkb:MAE
gptkb:Segmenter
gptkb:CLIP
gptkb:DINO
gptkb:Swin_Transformer
gptkb:BEiT
gptkb:SAM
vision-language models
multimodal transformers
ViT-GPT2
Masked Autoencoders
ViT-Adapter
ViT-Adapter-B
ViT-Adapter-Base
ViT-Adapter-Giant
ViT-Adapter-H
ViT-Adapter-Huge
ViT-Adapter-L
ViT-Adapter-Large
ViT-Adapter-Mega
ViT-Adapter-Super
ViT-Adapter-Ultra
ViT-Adapter-XL
ViT-Adapter-XXL
ViT-VQGAN
ViTAdapter
ViTDet
gptkbp:input gptkb:illustrator
gptkbp:inspiredBy transformer architecture
gptkbp:introducedIn 2020
gptkbp:notablePublication gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
gptkbp:openSource gptkb:TensorFlow
gptkb:PyTorch
gptkbp:size 16x16 pixels
gptkbp:splitsInputInto image patches
gptkbp:trainer gptkb:JFT-300M
gptkb:ImageNet
gptkbp:uses self-attention mechanism
gptkbp:bfsParent gptkb:EfficientNetV2
gptkb:OpenAI_CLIP
gptkb:Lucas_Beyer
gptkb:Neil_Houlsby
gptkbp:bfsLayer 7