Statements (49)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:research
|
| gptkbp:abbreviation |
gptkb:VQA
|
| gptkbp:application |
image understanding
multimodal AI |
| gptkbp:bench |
CLEVR dataset
COCO-QA dataset GQA dataset VQA dataset |
| gptkbp:challenge |
commonsense reasoning
language understanding multimodal reasoning visual grounding |
| gptkbp:firstMajorDataset |
VQA dataset (2015)
|
| gptkbp:firstReleased |
2015
|
| gptkbp:input |
gptkb:illustrator
gptkb:quest |
| gptkbp:memiliki_tugas |
answering natural language questions about images
|
| gptkbp:notableConference |
gptkb:CVPR
gptkb:ECCV gptkb:ICCV gptkb:NeurIPS gptkb:ACL |
| gptkbp:notableModel |
gptkb:UNITER
gptkb:LXMERT gptkb:ViLBERT MCB (Multimodal Compact Bilinear Pooling) VisualBERT Bottom-Up and Top-Down Attention (Anderson et al., 2018) |
| gptkbp:notablePublication |
VQA: Visual Question Answering (Antol et al., 2015)
Making the V in VQA Matter: Elevating the Role of Image Understanding in VQA (Agrawal et al., 2017) CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning (Johnson et al., 2017) |
| gptkbp:openSource |
HuggingFace Transformers
PyTorch VQA VQA Challenge codebase |
| gptkbp:organizer |
VQA Challenge
|
| gptkbp:output |
answer
|
| gptkbp:relatedTo |
computer vision
natural language processing Visual Commonsense Reasoning Image Captioning Visual Dialog |
| gptkbp:studies |
automatic answering of questions about images
|
| gptkbp:uses |
convolutional neural networks
deep learning transformers recurrent neural networks |
| gptkbp:bfsParent |
gptkb:Devi_Parikh
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Visual Question Answering
|