Visual Question Answering

GPTKB entity

Statements (49)
Predicate Object
gptkbp:instanceOf research
gptkbp:abbreviation gptkb:VQA
gptkbp:application image understanding
multimodal AI
gptkbp:bench CLEVR dataset
COCO-QA dataset
GQA dataset
VQA dataset
gptkbp:challenge commonsense reasoning
language understanding
multimodal reasoning
visual grounding
gptkbp:firstMajorDataset VQA dataset (2015)
gptkbp:firstReleased 2015
https://www.w3.org/2000/01/rdf-schema#label Visual Question Answering
gptkbp:input gptkb:illustrator
gptkb:quest
gptkbp:memiliki_tugas answering natural language questions about images
gptkbp:notableConference gptkb:CVPR
gptkb:ECCV
gptkb:ICCV
gptkb:NeurIPS
gptkb:ACL
gptkbp:notableModel gptkb:UNITER
gptkb:LXMERT
gptkb:ViLBERT
MCB (Multimodal Compact Bilinear Pooling)
VisualBERT
Bottom-Up and Top-Down Attention (Anderson et al., 2018)
gptkbp:notablePublication VQA: Visual Question Answering (Antol et al., 2015)
Making the V in VQA Matter: Elevating the Role of Image Understanding in VQA (Agrawal et al., 2017)
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning (Johnson et al., 2017)
gptkbp:openSource HuggingFace Transformers
PyTorch VQA
VQA Challenge codebase
gptkbp:organizer VQA Challenge
gptkbp:output answer
gptkbp:relatedTo computer vision
natural language processing
Visual Commonsense Reasoning
Image Captioning
Visual Dialog
gptkbp:studies automatic answering of questions about images
gptkbp:uses convolutional neural networks
deep learning
transformers
recurrent neural networks
gptkbp:bfsParent gptkb:Zero-Shot_Learning
gptkbp:bfsLayer 6