PaliGemma 2

A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.

1 min