A multimodal vision-language model family for image understanding and visual question answering.

Model Intelligence
Benchmarkable
No
Model level
family
Recent stories
1 linked story
A multimodal vision-language model family for image understanding and visual question answering.
