Providers Tracked
24
Cognaptus DataHub Monitor
A daily dashboard for monitoring free AI inference providers, with curated vendor boards and a machine-refreshable OpenRouter free-model roster.
Providers Tracked
24
OpenRouter Free Models
42
Provider Families
11
Multimodal Free Models
13
Coding-Friendly Free Models
6
Capability Signals
Text 42 / Image 13 / Audio 4 / Video 8
Free inference capacity is fragmented. Stable vendor free tiers, rotating cloud quotas, and zero-cost OpenRouter routes need different monitoring logic.
This page is now structured as a monitor rather than a one-off article.
The daily job should separate three layers:
inferencer.The goal is not just to list providers. It is to help you answer a daily operating question: which free routes are realistically usable right now for text, coding, multimodal, image, speech, and video workloads?
Companies that operate their own API and offer some free or free-trial inference access.
Google AI Studio
Most broadly usable direct free tier.
GroqCloud
Fastest developer-facing free path for open models.
Cerebras Cloud
Useful as a high-throughput backup route.
Cohere
Useful for business NLP tooling.
Mistral
Strong open-weight ecosystem.
DeepSeek
Monitor for changes in API trial policy.
MiniMax
Relevant for agent and media workflows.
Moonshot / Kimi
Worth tracking for long-context offerings.
Multi-model platforms where free capacity changes often and is worth checking daily.
OpenRouter
Primary daily monitor source via inferencer.
Hugging Face Inference
Largest long-tail open-model surface.
Together AI
Good backup for open-weight text and diffusion.
Cloudflare Workers AI
Edge inference is strategically distinct.
Fireworks AI
Useful for comparing open-model economics.
Baseten
More deployment-oriented than general free inference.
Replicate
Best tracked for specialized modalities.
Fal.ai
Media generation prices move quickly.
Modal
More infra than gateway, but still relevant.
Providers best monitored by capability rather than by general LLM coverage.
ElevenLabs
Important benchmark for TTS.
Deepgram
Useful ASR baseline.
AssemblyAI
Speech-first provider.
Stability AI
Core diffusion benchmark.
Black Forest Labs
Track FLUX availability changes.
Runway
Consumer-friendly video benchmark.
Luma AI
Worth tracking for motion quality.
This block is designed for daily refresh from `inferencer::list_openrouter_models()`.
cognitivecomputations/dolphin-mistral-24b-venice-edition:free
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an "uncensored" instruct-tuned LLM, preserving...
google/gemma-4-26b-a4b-it:free
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference - delivering near-31B quality at...
google/gemma-4-26b-a4b-it:free
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference - delivering near-31B quality at...
google/gemma-4-31b-it:free
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
google/gemma-4-31b-it:free
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
google/lyria-3-clip-preview
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...
google/lyria-3-pro-preview
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
liquid/lfm-2.5-1.2b-instruct:free
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.
liquid/lfm-2.5-1.2b-instruct:free
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.
liquid/lfm-2.5-1.2b-thinking:free
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG-while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
liquid/lfm-2.5-1.2b-thinking:free
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG-while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
meta-llama/llama-3.2-3b-instruct:free
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
meta-llama/llama-3.3-70b-instruct:free
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
meta-llama/llama-3.3-70b-instruct:free
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
nex-agi/nex-n2-pro:free
Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...
nousresearch/hermes-3-llama-3.1-405b:free
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
nousresearch/hermes-3-llama-3.1-405b:free
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
nvidia/nemotron-3-nano-30b-a3b:free
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
nvidia/nemotron-3-nano-30b-a3b:free
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
NVIDIA NemotronTM 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
NVIDIA NemotronTM 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
nvidia/nemotron-3-super-120b-a12b:free
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
nvidia/nemotron-3-super-120b-a12b:free
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
nvidia/nemotron-3-ultra-550b-a55b:free
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
nvidia/nemotron-3-ultra-550b-a55b:free
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
nvidia/nemotron-3.5-content-safety:free
NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
nvidia/nemotron-nano-12b-v2-vl:free
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba's...
nvidia/nemotron-nano-12b-v2-vl:free
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba's...
nvidia/nemotron-nano-9b-v2:free
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
nvidia/nemotron-nano-9b-v2:free
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
openai/gpt-oss-120b:free
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...
openai/gpt-oss-120b:free
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...
openai/gpt-oss-20b:free
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
openai/gpt-oss-20b:free
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
openrouter/free
The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...
openrouter/owl-alpha
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....
poolside/laguna-m.1:free
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...
poolside/laguna-xs.2:free
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
qwen/qwen3-coder:free
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
qwen/qwen3-coder:free
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
qwen/qwen3-next-80b-a3b-instruct:free
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without "thinking" traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...
qwen/qwen3-next-80b-a3b-instruct:free
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without "thinking" traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...
This page should be refreshed daily because free-model availability changes quickly.
Category-aware free-model counts are derived from inferencer's OpenRouter category wrappers, while the provider boards remain manually curated.
The OpenRouter section is machine-generated; the provider boards are intentionally curated.
Use the dashboard as a monitor, not as a compliance promise. Vendor free-tier terms can change without warning.