Statements (23)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:vision-language_model
|
| gptkbp:application |
image captioning
image-text retrieval visual question answering |
| gptkbp:architecture |
gptkb:transformation
|
| gptkbp:arXivID |
2006.16934
|
| gptkbp:basedOn |
gptkb:ERNIE
|
| gptkbp:developedBy |
gptkb:Baidu
|
| gptkbp:improves |
vision-language understanding
|
| gptkbp:input |
gptkb:illustrator
gptkb:text |
| gptkbp:language |
gptkb:Chinese
English |
| gptkbp:notablePublication |
gptkb:ERNIE-ViL:_Knowledge_Enhanced_Vision-Language_Representations_Through_Scene_Graph
|
| gptkbp:relatedTo |
gptkb:artificial_intelligence
deep learning multimodal learning |
| gptkbp:releaseYear |
2021
|
| gptkbp:trainer |
large-scale image-text pairs
|
| gptkbp:uses |
knowledge-enhanced pre-training
|
| gptkbp:bfsParent |
gptkb:ERNIE
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
ERNIE-ViL
|