Aya Vision

An open-weight vision encoder developed by Cohere For AI, part of Project Aya’s global multilingual and multimodal research initiative.

1 min

ChatGPT-4o (Omni)

OpenAI’s flagship GPT-4 variant that natively supports text, vision, and audio input/output with faster performance and improved reasoning.

1 min

Claude 3 Sonnet

A mid-sized member of Anthropic’s Claude 3 model family, optimized for balanced performance across reasoning, speed, and multimodal understanding.

1 min

DeepSeek-V3

A multi-modal foundation model by DeepSeek AI, integrating vision and language for high-performance tasks including OCR, captioning, and visual reasoning.

1 min

Gemini 2 Flash

A fast and lightweight variant of Google’s Gemini 2.0 multimodal model, optimized for latency-sensitive tasks via the Gemini API.

1 min

Grok-2

The next-generation model from xAI, built on a new architecture and fully integrated into X (formerly Twitter) as part of Elon Musk’s AI assistant efforts.

1 min

PaliGemma 2

A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.

1 min