Statements (28)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:multimodal_AI_model
|
| gptkbp:architecture |
vision-language pre-training
|
| gptkbp:availableOn |
gptkb:GitHub
|
| gptkbp:citation |
1000+
|
| gptkbp:developedBy |
gptkb:Salesforce_Research
|
| gptkbp:enables |
few-shot learning
zero-shot image-to-text tasks |
| gptkbp:language |
gptkb:OPT
FlanT5 |
| gptkbp:license |
gptkb:BSD-3-Clause
|
| gptkbp:memiliki_tugas |
image captioning
visual question answering image-to-text generation |
| gptkbp:mode |
gptkb:DVD
gptkb:language |
| gptkbp:notablePublication |
https://arxiv.org/abs/2301.12597
BLIP-2: Bootstrapped Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
| gptkbp:openSource |
true
|
| gptkbp:releaseYear |
2023
|
| gptkbp:usedFor |
AI benchmarking
multimodal research |
| gptkbp:uses |
frozen large language model
frozen vision encoder querying transformer |
| gptkbp:visionEncoder |
gptkb:ViT
|
| gptkbp:bfsParent |
gptkb:Hugging_Face_models
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Blip-2
|