Statements (23)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:model
gptkb:deep_learning_architecture |
| gptkbp:appliesTo |
image-text retrieval
visual question answering multimodal classification |
| gptkbp:enables |
cross-modal reasoning
joint representation learning |
| gptkbp:field |
gptkb:artificial_intelligence
computer vision natural language processing |
| gptkbp:handles |
multimodal data
|
| gptkbp:input |
gptkb:text
images |
| gptkbp:output |
predictions
joint embeddings |
| gptkbp:relatedTo |
gptkb:BERT
gptkb:LXMERT gptkb:ViLBERT Vision-Language Pretraining |
| gptkbp:uses |
transformer architecture
|
| gptkbp:bfsParent |
gptkb:MMBT
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Multimodal Bitransformers
|