PaliGemma 2
A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.
A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.