Foundation Model Pricing Snapshot

DATA UPDATED: March 12, 2026 The snapshot covers: Text and reasoning LLMs Multimodal models Image generation models Video generation models Speech generation and speech recognition models Pricing has been normalized whenever possible: Pricing Type Normalized Unit Text tokens per 1M tokens Images per image Video per second Speech per minute or 1M characters 1. Text & Reasoning Models These models power most AI agents, chatbots, copilots, and enterprise automation systems. ...

March 12, 2026 · 4 min

Gemini 2 Flash

A fast and lightweight variant of Google’s Gemini 2.0 multimodal model, optimized for latency-sensitive tasks via the Gemini API.

1 min

Gemma 7B

A 7-billion-parameter open-weight language model developed by Google, optimized for efficiency, safety, and general-purpose reasoning.

1 min

PaliGemma 2

A next-generation vision-language model by Google, combining Gemma LLM and SigLIP vision encoder for image captioning, VQA, and image-text reasoning tasks.

1 min