DATA UPDATED: March 12, 2026
The snapshot covers:
- Text and reasoning LLMs
- Multimodal models
- Image generation models
- Video generation models
- Speech generation and speech recognition models
Pricing has been normalized whenever possible:
| Pricing Type | Normalized Unit |
|---|---|
| Text tokens | per 1M tokens |
| Images | per image |
| Video | per second |
| Speech | per minute or 1M characters |
1. Text & Reasoning Models
These models power most AI agents, chatbots, copilots, and enterprise automation systems.
| Provider | Model | Category | Context Window | Input / 1M tokens | Output / 1M tokens |
|---|---|---|---|---|---|
| OpenAI | GPT-5.4 | Reasoning | 1.05M | $2.5 | $15 |
| OpenAI | GPT-5 mini | Reasoning | 400K | $0.25 | $2 |
| Anthropic | Claude Opus 4.6 | Reasoning | — | $5 | $25 |
| Anthropic | Claude Sonnet 4.6 | Reasoning | — | $3 | $15 |
| Anthropic | Claude Haiku 4.5 | Text | — | $1 | $5 |
| Gemini 2.5 Pro | Multimodal | 1M | $1.25 – $2.5 | $10 – $15 | |
| Gemini 2.5 Flash | Multimodal | 1M | $0.30 | $2.5 | |
| xAI | Grok 4.20 Beta | Multimodal | 2M | $2 | $6 |
| xAI | Grok 4.1 Fast | Multimodal | 2M | $0.20 | $0.50 |
| DeepSeek | DeepSeek-Chat | Text | 128K | $0.28 | $0.42 |
| DeepSeek | DeepSeek-Reasoner | Reasoning | 128K | $0.28 | $0.42 |
| Alibaba | Qwen-Plus | Reasoning | 1M | $0.115 – $0.689 | $0.287 – $9.175 |
| Cohere | Command | Text | — | $1 | $2 |
| Cohere | Command R+ | Reasoning | — | $2.5 | $10 |
| MiniMax | M2.5 | Reasoning | — | $0.30 | $1.20 |
Observations
The cost gap between frontier and efficiency models is now extremely large.
Examples:
- GPT-5 mini is 10x cheaper than GPT-5.4
- Grok Fast is ~12x cheaper than Grok flagship
- DeepSeek remains the lowest-cost frontier-grade reasoning model
The industry appears to be converging toward two economic layers:
- Frontier reasoning models ($2–$5 per 1M input tokens)
- high-throughput inference models ($0.2–$0.5 per 1M tokens)
2. Multimodal and Real-Time Models
Multimodal models integrate text, audio, image, and video inputs.
| Provider | Model | Modalities | Pricing |
|---|---|---|---|
| OpenAI | GPT-Realtime-1.5 | text / audio / image | $4 input / $16 output per 1M tokens |
| Gemini 2.5 Pro | text / image / audio / video | $1.25–$2.5 input | |
| Gemini 2.5 Flash | multimodal | $0.30 input | |
| xAI | Grok 4.20 | text / image | $2 input |
| Qwen | Omni-Turbo | text / image / audio / video | $0.058 input |
Observations
Multimodal models increasingly act as unified perception engines for AI systems.
The trend is toward one model handling multiple modalities, replacing specialized pipelines.
3. Image Generation Models
| Provider | Model | Price |
|---|---|---|
| OpenAI | GPT-Image-1.5 | ~$0.02 per image |
| Gemini Flash Image | $0.039 per image | |
| Imagen 4 Ultra | $0.06 per image | |
| Black Forest Labs | FLUX.2 Pro | $0.03 per image |
| Black Forest Labs | FLUX Kontext Max | $0.08 per image |
| MiniMax | image-01 | $0.0035 per image |
Observations
Image generation pricing has collapsed dramatically.
MiniMax’s price of $0.0035 per image shows that image models are approaching commodity compute economics.
4. Video Generation Models
Video generation is currently the most expensive modality.
| Provider | Model | Price |
|---|---|---|
| OpenAI | Sora 2 Pro | $0.30 – $0.70 / second |
| Veo 3.1 | $0.40 / second | |
| Runway | Gen4.5 | $0.12 / second |
| MiniMax | Hailuo 2.3 | ~$0.03 / second |
| Luma | Ray 3.14 | credit-based |
Observations
Video models remain compute-intensive diffusion systems, which explains the higher cost.
However the market is rapidly compressing prices.
Runway and MiniMax are aggressively pushing down cost per generated second.
5. Speech & Audio Models
Speech models are usually priced by character or minute of audio.
| Provider | Model | Pricing |
|---|---|---|
| ElevenLabs | Flash / Turbo TTS | $60 per 1M characters |
| ElevenLabs | Scribe v2 | $0.39 per audio hour |
| Deepgram | Flux STT | $0.0077 per minute |
| Deepgram | Aura-2 TTS | $30 per 1M characters |
| MiniMax | Speech-2.8 Turbo | $60 per 1M characters |
Observations
Speech recognition has become very cheap infrastructure.
Deepgram’s pricing (~$0.007/min) indicates that speech recognition is now nearly a solved commodity problem.
6. Subscription Platforms
Some AI platforms still rely on consumer subscriptions rather than APIs.
| Platform | Plan | Price |
|---|---|---|
| Midjourney | Standard | $30 / month |
| Midjourney | Pro | $60 / month |
| Pika | Standard | $8 / month |
| Luma | Pro | $90 / month |
These platforms typically provide bundled GPU credits rather than pay-per-token pricing.
Key Industry Trends
1. Massive Price Compression
AI inference prices continue to fall rapidly.
Some examples:
- DeepSeek models under $0.50 / 1M tokens
- Image generation approaching fractions of a cent
- Speech recognition under $0.01 per minute
This suggests AI inference is transitioning from a scarce resource to commodity infrastructure.
2. Multimodal Convergence
Most frontier models now support:
- text
- image
- audio
- video
Instead of building separate pipelines, developers increasingly use one multimodal model as a universal reasoning engine.
3. Stratification of AI Models
The market is splitting into two layers:
Frontier reasoning models
- GPT-5
- Claude Opus
- Gemini Pro
High-throughput inference models
- DeepSeek
- Grok Fast
- Gemini Flash
- MiniMax
The first layer focuses on intelligence, while the second focuses on cost-efficient scale.
Data Source
All pricing data was collected from official developer documentation and pricing pages including:
- OpenAI API
- Anthropic Claude API
- Google Gemini API
- xAI API
- DeepSeek API
- Alibaba Model Studio
- Cohere API
- MiniMax API
- Stability AI
- Runway
- Midjourney
- ElevenLabs
- Deepgram
The underlying machine-readable dataset is maintained in the Cognaptus DataHub and updated daily.
Cognaptus: Automate the Present, Incubate the Future