GPTKB
Browse
Query
Compare
Download
Publications
Contributors
Search
Vision Transformer (ViT): An Image is Worth 16x16 Words
URI:
https://gptkb.org/entity/Vision_Transformer_(ViT):_An_Image_is_Worth_16x16_Words
GPTKB entity
Statements (32)
Predicate
Object
gptkbp:instanceOf
gptkb:academic_journal
gptkbp:arXivID
2010.11929
gptkbp:author
gptkb:Alexey_Dosovitskiy
gptkb:Jakob_Uszkoreit
gptkb:Alexander_Kolesnikov
gptkb:Dirk_Weissenborn
gptkb:Georg_Heigold
gptkb:Lucas_Beyer
gptkb:Matthias_Minderer
gptkb:Mostafa_Dehghani
gptkb:Neil_Houlsby
gptkb:Sylvain_Gelly
gptkb:Thomas_Unterthiner
gptkb:Xiaohua_Zhai
gptkbp:citation
high
gptkbp:contribution
shows transformers can outperform CNNs with sufficient data
splits images into 16x16 patches
treats image patches as tokens
applies transformer architecture to image classification
gptkbp:field
computer vision
deep learning
transformer models
https://www.w3.org/2000/01/rdf-schema#label
Vision Transformer (ViT): An Image is Worth 16x16 Words
gptkbp:influenced
subsequent vision transformer research
gptkbp:proposedBy
gptkb:Vision_Transformer_(ViT)
gptkbp:publicationYear
2021
gptkbp:publishedIn
gptkb:International_Conference_on_Learning_Representations_(ICLR)
gptkbp:shortName
gptkb:Vision_Transformer_(ViT)
gptkbp:title
gptkb:An_Image_is_Worth_16x16_Words:_Transformers_for_Image_Recognition_at_Scale
gptkbp:url
https://arxiv.org/abs/2010.11929
gptkbp:bfsParent
gptkb:Google_Brain_(former)
gptkbp:bfsLayer
7