ViLT

URI: https://gptkb.org/entity/ViLT

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:Vision-and-Language_Model
gptkbp:architecture	gptkb:transformation
gptkbp:author	Bokyung Son Ildoo Kim Wonjae Kim
gptkbp:citation	1000+
gptkbp:designedFor	Vision-and-Language Pretraining
gptkbp:developedBy	gptkb:NAVER_AI_Lab
gptkbp:input	gptkb:illustrator gptkb:text
gptkbp:introducedIn	2021
gptkbp:language	English
gptkbp:memiliki_tugas	gptkb:Visual_Question_Answering Image Captioning Image-Text Retrieval Visual Reasoning
gptkbp:notableFor	Efficient vision-and-language model No convolutional or region-based visual backbone
gptkbp:notablePublication	https://arxiv.org/abs/2102.03334 ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
gptkbp:openSource	Yes
gptkbp:pretrainingDataset	gptkb:COCO gptkb:Flickr30k gptkb:VQA NLVR2
gptkbp:repository	https://github.com/dandelin/ViLT
gptkbp:uses	Multimodal Transformer Encoder
gptkbp:bfsParent	gptkb:BLIP
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	ViLT