Aya Vision
An open-weight vision encoder developed by Cohere For AI, part of Project Aya’s global multilingual and multimodal research initiative.
An open-weight vision encoder developed by Cohere For AI, part of Project Aya’s global multilingual and multimodal research initiative.
OpenAI’s flagship GPT-4 variant that natively supports text, vision, and audio input/output with faster performance and improved reasoning.
A mid-sized member of Anthropic’s Claude 3 model family, optimized for balanced performance across reasoning, speed, and multimodal understanding.
A multi-modal foundation model by DeepSeek AI, integrating vision and language for high-performance tasks including OCR, captioning, and visual reasoning.
A fast and lightweight variant of Google’s Gemini 2.0 multimodal model, optimized for latency-sensitive tasks via the Gemini API.
The next-generation model from xAI, built on a new architecture and fully integrated into X (formerly Twitter) as part of Elon Musk’s AI assistant efforts.
A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.