DATA UPDATED: March 12, 2026

The snapshot covers:

Text and reasoning LLMs
Multimodal models
Image generation models
Video generation models
Speech generation and speech recognition models

Pricing has been normalized whenever possible:

Pricing Type	Normalized Unit
Text tokens	per 1M tokens
Images	per image
Video	per second
Speech	per minute or 1M characters

1. Text & Reasoning Models

These models power most AI agents, chatbots, copilots, and enterprise automation systems.

Provider	Model	Category	Context Window	Input / 1M tokens	Output / 1M tokens
OpenAI	GPT-5.4	Reasoning	1.05M	$2.5	$15
OpenAI	GPT-5 mini	Reasoning	400K	$0.25	$2
Anthropic	Claude Opus 4.6	Reasoning	—	$5	$25
Anthropic	Claude Sonnet 4.6	Reasoning	—	$3	$15
Anthropic	Claude Haiku 4.5	Text	—	$1	$5
Google	Gemini 2.5 Pro	Multimodal	1M	$1.25 – $2.5	$10 – $15
Google	Gemini 2.5 Flash	Multimodal	1M	$0.30	$2.5
xAI	Grok 4.20 Beta	Multimodal	2M	$2	$6
xAI	Grok 4.1 Fast	Multimodal	2M	$0.20	$0.50
DeepSeek	DeepSeek-Chat	Text	128K	$0.28	$0.42
DeepSeek	DeepSeek-Reasoner	Reasoning	128K	$0.28	$0.42
Alibaba	Qwen-Plus	Reasoning	1M	$0.115 – $0.689	$0.287 – $9.175
Cohere	Command	Text	—	$1	$2
Cohere	Command R+	Reasoning	—	$2.5	$10
MiniMax	M2.5	Reasoning	—	$0.30	$1.20

Observations

The cost gap between frontier and efficiency models is now extremely large.

Examples:

GPT-5 mini is 10x cheaper than GPT-5.4
Grok Fast is ~12x cheaper than Grok flagship
DeepSeek remains the lowest-cost frontier-grade reasoning model

The industry appears to be converging toward two economic layers:

Frontier reasoning models ($2–$5 per 1M input tokens)
high-throughput inference models ($0.2–$0.5 per 1M tokens)

2. Multimodal and Real-Time Models

Multimodal models integrate text, audio, image, and video inputs.

Provider	Model	Modalities	Pricing
OpenAI	GPT-Realtime-1.5	text / audio / image	$4 input / $16 output per 1M tokens
Google	Gemini 2.5 Pro	text / image / audio / video	$1.25–$2.5 input
Google	Gemini 2.5 Flash	multimodal	$0.30 input
xAI	Grok 4.20	text / image	$2 input
Qwen	Omni-Turbo	text / image / audio / video	$0.058 input

Observations

Multimodal models increasingly act as unified perception engines for AI systems.

The trend is toward one model handling multiple modalities, replacing specialized pipelines.

3. Image Generation Models

Provider	Model	Price
OpenAI	GPT-Image-1.5	~$0.02 per image
Google	Gemini Flash Image	$0.039 per image
Google	Imagen 4 Ultra	$0.06 per image
Black Forest Labs	FLUX.2 Pro	$0.03 per image
Black Forest Labs	FLUX Kontext Max	$0.08 per image
MiniMax	image-01	$0.0035 per image

Observations

Image generation pricing has collapsed dramatically.

MiniMax’s price of $0.0035 per image shows that image models are approaching commodity compute economics.

4. Video Generation Models

Video generation is currently the most expensive modality.

Provider	Model	Price
OpenAI	Sora 2 Pro	$0.30 – $0.70 / second
Google	Veo 3.1	$0.40 / second
Runway	Gen4.5	$0.12 / second
MiniMax	Hailuo 2.3	~$0.03 / second
Luma	Ray 3.14	credit-based

Observations

Video models remain compute-intensive diffusion systems, which explains the higher cost.

However the market is rapidly compressing prices.

Runway and MiniMax are aggressively pushing down cost per generated second.

5. Speech & Audio Models

Speech models are usually priced by character or minute of audio.

Provider	Model	Pricing
ElevenLabs	Flash / Turbo TTS	$60 per 1M characters
ElevenLabs	Scribe v2	$0.39 per audio hour
Deepgram	Flux STT	$0.0077 per minute
Deepgram	Aura-2 TTS	$30 per 1M characters
MiniMax	Speech-2.8 Turbo	$60 per 1M characters

Observations

Speech recognition has become very cheap infrastructure.

Deepgram’s pricing (~$0.007/min) indicates that speech recognition is now nearly a solved commodity problem.

6. Subscription Platforms

Some AI platforms still rely on consumer subscriptions rather than APIs.

Platform	Plan	Price
Midjourney	Standard	$30 / month
Midjourney	Pro	$60 / month
Pika	Standard	$8 / month
Luma	Pro	$90 / month

These platforms typically provide bundled GPU credits rather than pay-per-token pricing.

Key Industry Trends

1. Massive Price Compression

AI inference prices continue to fall rapidly.

Some examples:

DeepSeek models under $0.50 / 1M tokens
Image generation approaching fractions of a cent
Speech recognition under $0.01 per minute

This suggests AI inference is transitioning from a scarce resource to commodity infrastructure.

2. Multimodal Convergence

Most frontier models now support:

text
image
audio
video

Instead of building separate pipelines, developers increasingly use one multimodal model as a universal reasoning engine.

3. Stratification of AI Models

The market is splitting into two layers:

Frontier reasoning models

GPT-5
Claude Opus
Gemini Pro

High-throughput inference models

DeepSeek
Grok Fast
Gemini Flash
MiniMax

The first layer focuses on intelligence, while the second focuses on cost-efficient scale.

Data Source

All pricing data was collected from official developer documentation and pricing pages including:

OpenAI API
Anthropic Claude API
Google Gemini API
xAI API
DeepSeek API
Alibaba Model Studio
Cohere API
MiniMax API
Stability AI
Runway
Midjourney
ElevenLabs
Deepgram

The underlying machine-readable dataset is maintained in the Cognaptus DataHub and updated daily.

Cognaptus: Automate the Present, Incubate the Future

1. Text & Reasoning Models#

Observations#

2. Multimodal and Real-Time Models#

Observations#

3. Image Generation Models#

Observations#

4. Video Generation Models#

Observations#

5. Speech & Audio Models#

Observations#

6. Subscription Platforms#

Key Industry Trends#

1. Massive Price Compression#

2. Multimodal Convergence#

3. Stratification of AI Models#

Data Source#

1. Text & Reasoning Models

Observations

2. Multimodal and Real-Time Models

Observations

3. Image Generation Models

Observations

4. Video Generation Models

Observations

5. Speech & Audio Models

Observations

6. Subscription Platforms

Key Industry Trends

1. Massive Price Compression

2. Multimodal Convergence

3. Stratification of AI Models

Data Source