DATA UPDATED: March 12, 2026

The snapshot covers:

  • Text and reasoning LLMs
  • Multimodal models
  • Image generation models
  • Video generation models
  • Speech generation and speech recognition models

Pricing has been normalized whenever possible:

Pricing Type Normalized Unit
Text tokens per 1M tokens
Images per image
Video per second
Speech per minute or 1M characters

1. Text & Reasoning Models

These models power most AI agents, chatbots, copilots, and enterprise automation systems.

Provider Model Category Context Window Input / 1M tokens Output / 1M tokens
OpenAI GPT-5.4 Reasoning 1.05M $2.5 $15
OpenAI GPT-5 mini Reasoning 400K $0.25 $2
Anthropic Claude Opus 4.6 Reasoning $5 $25
Anthropic Claude Sonnet 4.6 Reasoning $3 $15
Anthropic Claude Haiku 4.5 Text $1 $5
Google Gemini 2.5 Pro Multimodal 1M $1.25 – $2.5 $10 – $15
Google Gemini 2.5 Flash Multimodal 1M $0.30 $2.5
xAI Grok 4.20 Beta Multimodal 2M $2 $6
xAI Grok 4.1 Fast Multimodal 2M $0.20 $0.50
DeepSeek DeepSeek-Chat Text 128K $0.28 $0.42
DeepSeek DeepSeek-Reasoner Reasoning 128K $0.28 $0.42
Alibaba Qwen-Plus Reasoning 1M $0.115 – $0.689 $0.287 – $9.175
Cohere Command Text $1 $2
Cohere Command R+ Reasoning $2.5 $10
MiniMax M2.5 Reasoning $0.30 $1.20

Observations

The cost gap between frontier and efficiency models is now extremely large.

Examples:

  • GPT-5 mini is 10x cheaper than GPT-5.4
  • Grok Fast is ~12x cheaper than Grok flagship
  • DeepSeek remains the lowest-cost frontier-grade reasoning model

The industry appears to be converging toward two economic layers:

  1. Frontier reasoning models ($2–$5 per 1M input tokens)
  2. high-throughput inference models ($0.2–$0.5 per 1M tokens)

2. Multimodal and Real-Time Models

Multimodal models integrate text, audio, image, and video inputs.

Provider Model Modalities Pricing
OpenAI GPT-Realtime-1.5 text / audio / image $4 input / $16 output per 1M tokens
Google Gemini 2.5 Pro text / image / audio / video $1.25–$2.5 input
Google Gemini 2.5 Flash multimodal $0.30 input
xAI Grok 4.20 text / image $2 input
Qwen Omni-Turbo text / image / audio / video $0.058 input

Observations

Multimodal models increasingly act as unified perception engines for AI systems.

The trend is toward one model handling multiple modalities, replacing specialized pipelines.


3. Image Generation Models

Provider Model Price
OpenAI GPT-Image-1.5 ~$0.02 per image
Google Gemini Flash Image $0.039 per image
Google Imagen 4 Ultra $0.06 per image
Black Forest Labs FLUX.2 Pro $0.03 per image
Black Forest Labs FLUX Kontext Max $0.08 per image
MiniMax image-01 $0.0035 per image

Observations

Image generation pricing has collapsed dramatically.

MiniMax’s price of $0.0035 per image shows that image models are approaching commodity compute economics.


4. Video Generation Models

Video generation is currently the most expensive modality.

Provider Model Price
OpenAI Sora 2 Pro $0.30 – $0.70 / second
Google Veo 3.1 $0.40 / second
Runway Gen4.5 $0.12 / second
MiniMax Hailuo 2.3 ~$0.03 / second
Luma Ray 3.14 credit-based

Observations

Video models remain compute-intensive diffusion systems, which explains the higher cost.

However the market is rapidly compressing prices.

Runway and MiniMax are aggressively pushing down cost per generated second.


5. Speech & Audio Models

Speech models are usually priced by character or minute of audio.

Provider Model Pricing
ElevenLabs Flash / Turbo TTS $60 per 1M characters
ElevenLabs Scribe v2 $0.39 per audio hour
Deepgram Flux STT $0.0077 per minute
Deepgram Aura-2 TTS $30 per 1M characters
MiniMax Speech-2.8 Turbo $60 per 1M characters

Observations

Speech recognition has become very cheap infrastructure.

Deepgram’s pricing (~$0.007/min) indicates that speech recognition is now nearly a solved commodity problem.


6. Subscription Platforms

Some AI platforms still rely on consumer subscriptions rather than APIs.

Platform Plan Price
Midjourney Standard $30 / month
Midjourney Pro $60 / month
Pika Standard $8 / month
Luma Pro $90 / month

These platforms typically provide bundled GPU credits rather than pay-per-token pricing.


Key Industry Trends

1. Massive Price Compression

AI inference prices continue to fall rapidly.

Some examples:

  • DeepSeek models under $0.50 / 1M tokens
  • Image generation approaching fractions of a cent
  • Speech recognition under $0.01 per minute

This suggests AI inference is transitioning from a scarce resource to commodity infrastructure.


2. Multimodal Convergence

Most frontier models now support:

  • text
  • image
  • audio
  • video

Instead of building separate pipelines, developers increasingly use one multimodal model as a universal reasoning engine.


3. Stratification of AI Models

The market is splitting into two layers:

Frontier reasoning models

  • GPT-5
  • Claude Opus
  • Gemini Pro

High-throughput inference models

  • DeepSeek
  • Grok Fast
  • Gemini Flash
  • MiniMax

The first layer focuses on intelligence, while the second focuses on cost-efficient scale.


Data Source

All pricing data was collected from official developer documentation and pricing pages including:

  • OpenAI API
  • Anthropic Claude API
  • Google Gemini API
  • xAI API
  • DeepSeek API
  • Alibaba Model Studio
  • Cohere API
  • MiniMax API
  • Stability AI
  • Runway
  • Midjourney
  • ElevenLabs
  • Deepgram

The underlying machine-readable dataset is maintained in the Cognaptus DataHub and updated daily.


Cognaptus: Automate the Present, Incubate the Future