Statements (28)
Predicate | Object |
---|---|
gptkbp:instanceOf |
multimodal AI model
|
gptkbp:architecture |
vision-language pre-training
|
gptkbp:availableOn |
gptkb:GitHub
|
gptkbp:citation |
1000+
|
gptkbp:developedBy |
gptkb:Salesforce_Research
|
gptkbp:enables |
few-shot learning
zero-shot image-to-text tasks |
https://www.w3.org/2000/01/rdf-schema#label |
Blip-2
|
gptkbp:language |
gptkb:OPT
FlanT5 |
gptkbp:license |
gptkb:BSD-3-Clause
|
gptkbp:memiliki_tugas |
image captioning
visual question answering image-to-text generation |
gptkbp:mode |
gptkb:DVD
gptkb:language |
gptkbp:notablePublication |
https://arxiv.org/abs/2301.12597
BLIP-2: Bootstrapped Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
gptkbp:openSource |
true
|
gptkbp:releaseYear |
2023
|
gptkbp:usedFor |
AI benchmarking
multimodal research |
gptkbp:uses |
frozen large language model
frozen vision encoder querying transformer |
gptkbp:visionEncoder |
gptkb:ViT
|
gptkbp:bfsParent |
gptkb:Hugging_Face_models
|
gptkbp:bfsLayer |
7
|