Statements (49)
Predicate | Object |
---|---|
gptkbp:instanceOf |
research
|
gptkbp:abbreviation |
gptkb:VQA
|
gptkbp:application |
image understanding
multimodal AI |
gptkbp:bench |
CLEVR dataset
COCO-QA dataset GQA dataset VQA dataset |
gptkbp:challenge |
commonsense reasoning
language understanding multimodal reasoning visual grounding |
gptkbp:firstMajorDataset |
VQA dataset (2015)
|
gptkbp:firstReleased |
2015
|
https://www.w3.org/2000/01/rdf-schema#label |
Visual Question Answering
|
gptkbp:input |
gptkb:illustrator
gptkb:quest |
gptkbp:memiliki_tugas |
answering natural language questions about images
|
gptkbp:notableConference |
gptkb:CVPR
gptkb:ECCV gptkb:ICCV gptkb:NeurIPS gptkb:ACL |
gptkbp:notableModel |
gptkb:UNITER
gptkb:LXMERT gptkb:ViLBERT MCB (Multimodal Compact Bilinear Pooling) VisualBERT Bottom-Up and Top-Down Attention (Anderson et al., 2018) |
gptkbp:notablePublication |
VQA: Visual Question Answering (Antol et al., 2015)
Making the V in VQA Matter: Elevating the Role of Image Understanding in VQA (Agrawal et al., 2017) CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning (Johnson et al., 2017) |
gptkbp:openSource |
HuggingFace Transformers
PyTorch VQA VQA Challenge codebase |
gptkbp:organizer |
VQA Challenge
|
gptkbp:output |
answer
|
gptkbp:relatedTo |
computer vision
natural language processing Visual Commonsense Reasoning Image Captioning Visual Dialog |
gptkbp:studies |
automatic answering of questions about images
|
gptkbp:uses |
convolutional neural networks
deep learning transformers recurrent neural networks |
gptkbp:bfsParent |
gptkb:Zero-Shot_Learning
|
gptkbp:bfsLayer |
6
|