Providers Tracked
24
Cognaptus DataHub Monitor
A daily dashboard for monitoring free AI inference providers, with curated vendor boards and a machine-refreshable OpenRouter free-model roster.
Providers Tracked
24
OpenRouter Free Models
33
Provider Families
15
Multimodal Free Models
11
Coding-Friendly Free Models
4
Capability Signals
Text 33 / Image 11 / Audio 3 / Video 4
Free inference capacity is fragmented. Stable vendor free tiers, rotating cloud quotas, and zero-cost OpenRouter routes need different monitoring logic.
This page is now structured as a monitor rather than a one-off article.
The daily job should separate three layers:
inferencer.The goal is not just to list providers. It is to help you answer a daily operating question: which free routes are realistically usable right now for text, coding, multimodal, image, speech, and video workloads?
Companies that operate their own API and offer some free or free-trial inference access.
Google AI Studio
Most broadly usable direct free tier.
GroqCloud
Fastest developer-facing free path for open models.
Cerebras Cloud
Useful as a high-throughput backup route.
Cohere
Useful for business NLP tooling.
Mistral
Strong open-weight ecosystem.
DeepSeek
Monitor for changes in API trial policy.
MiniMax
Relevant for agent and media workflows.
Moonshot / Kimi
Worth tracking for long-context offerings.
Multi-model platforms where free capacity changes often and is worth checking daily.
OpenRouter
Primary daily monitor source via inferencer.
Hugging Face Inference
Largest long-tail open-model surface.
Together AI
Good backup for open-weight text and diffusion.
Cloudflare Workers AI
Edge inference is strategically distinct.
Fireworks AI
Useful for comparing open-model economics.
Baseten
More deployment-oriented than general free inference.
Replicate
Best tracked for specialized modalities.
Fal.ai
Media generation prices move quickly.
Modal
More infra than gateway, but still relevant.
Providers best monitored by capability rather than by general LLM coverage.
ElevenLabs
Important benchmark for TTS.
Deepgram
Useful ASR baseline.
AssemblyAI
Speech-first provider.
Stability AI
Core diffusion benchmark.
Black Forest Labs
Track FLUX availability changes.
Runway
Consumer-friendly video benchmark.
Luma AI
Worth tracking for motion quality.
This block is designed for daily refresh from `inferencer::list_openrouter_models()`.
baidu/qianfan-ocr-fast:free
Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.
cognitivecomputations/dolphin-mistral-24b-venice-edition:free
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...
google/gemma-3-12b-it:free
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
google/gemma-3-27b-it:free
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
google/gemma-3-4b-it:free
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
google/gemma-3n-e2b-it:free
Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based...
google/gemma-3n-e4b-it:free
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...
google/gemma-4-26b-a4b-it:free
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...
google/gemma-4-31b-it:free
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
google/lyria-3-clip-preview
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...
google/lyria-3-pro-preview
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
inclusionai/ling-2.6-1t:free
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...
inclusionai/ling-2.6-flash:free
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....
liquid/lfm-2.5-1.2b-instruct:free
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.
liquid/lfm-2.5-1.2b-thinking:free
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
meta-llama/llama-3.2-3b-instruct:free
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
meta-llama/llama-3.3-70b-instruct:free
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
minimax/minimax-m2.5:free
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
nousresearch/hermes-3-llama-3.1-405b:free
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
nvidia/nemotron-3-nano-30b-a3b:free
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
nvidia/nemotron-3-super-120b-a12b:free
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
nvidia/nemotron-nano-12b-v2-vl:free
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...
nvidia/nemotron-nano-9b-v2:free
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
openai/gpt-oss-120b:free
gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...
openai/gpt-oss-20b:free
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
openrouter/free
The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...
poolside/laguna-m.1:free
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...
poolside/laguna-xs.2:free
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
qwen/qwen3-coder:free
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
qwen/qwen3-next-80b-a3b-instruct:free
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...
tencent/hy3-preview:free
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
z-ai/glm-4.5-air:free
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...
This page should be refreshed daily because free-model availability changes quickly.
The OpenRouter section is machine-generated; the provider boards are intentionally curated.
Use the dashboard as a monitor, not as a compliance promise. Vendor free-tier terms can change without warning.