Cognaptus DataHub Monitor

Free AI Inference Providers

A daily dashboard for monitoring free AI inference providers, with curated vendor boards and a machine-refreshable OpenRouter free-model roster.

Updated 2026-06-13 20:09:00 +0800 Source: Openrouter

Providers Tracked

24

OpenRouter Free Models

42

Provider Families

11

Multimodal Free Models

13

Coding-Friendly Free Models

6

Capability Signals

Text 42 / Image 13 / Audio 4 / Video 8

Why This Page Exists

Free inference capacity is fragmented. Stable vendor free tiers, rotating cloud quotas, and zero-cost OpenRouter routes need different monitoring logic.

This page is now structured as a monitor rather than a one-off article.

The daily job should separate three layers:

  1. Direct model vendors with relatively stable free quotas.
  2. Inference clouds where free capacity rotates more often.
  3. OpenRouter zero-cost models, which should be refreshed automatically from inferencer.

The goal is not just to list providers. It is to help you answer a daily operating question: which free routes are realistically usable right now for text, coding, multimodal, image, speech, and video workloads?

Direct foundation-model vendors

Companies that operate their own API and offer some free or free-trial inference access.

Google AI Studio

Most broadly usable direct free tier.

ACTIVE Free daily quota Text, image, audio, video

GroqCloud

Fastest developer-facing free path for open models.

ACTIVE Free inference tier Text, vision, speech

Cerebras Cloud

Useful as a high-throughput backup route.

ACTIVE Free quota Text

Cohere

Useful for business NLP tooling.

WATCH Developer quota Text, embeddings, rerank

Mistral

Strong open-weight ecosystem.

WATCH Trial credits / limited free access Text, embeddings

DeepSeek

Monitor for changes in API trial policy.

WATCH Free web usage and low-cost API Text, reasoning

MiniMax

Relevant for agent and media workflows.

WATCH Selective free access Text, speech, video

Moonshot / Kimi

Worth tracking for long-context offerings.

WATCH Selective free access Text

Inference clouds and gateways

Multi-model platforms where free capacity changes often and is worth checking daily.

OpenRouter

Primary daily monitor source via inferencer.

ACTIVE Zero-cost models rotate daily Text, vision, audio, video

Hugging Face Inference

Largest long-tail open-model surface.

ACTIVE Free usage quota Text, image, speech, embeddings

Together AI

Good backup for open-weight text and diffusion.

WATCH Free quota / credits Text, image

Cloudflare Workers AI

Edge inference is strategically distinct.

WATCH Bundled free allocation Text, image, speech

Fireworks AI

Useful for comparing open-model economics.

WATCH Credits / selected free access Text, image

Baseten

More deployment-oriented than general free inference.

WATCH Credits Model hosting

Replicate

Best tracked for specialized modalities.

WATCH Credits Image, video, audio

Fal.ai

Media generation prices move quickly.

WATCH Credits / trials Image, video

Modal

More infra than gateway, but still relevant.

WATCH Credits Model hosting

Specialized modality APIs

Providers best monitored by capability rather than by general LLM coverage.

ElevenLabs

Important benchmark for TTS.

WATCH Starter quota Speech

Deepgram

Useful ASR baseline.

WATCH Trial / starter credits Speech

AssemblyAI

Speech-first provider.

WATCH Credits Speech

Stability AI

Core diffusion benchmark.

WATCH Credits Image

Black Forest Labs

Track FLUX availability changes.

WATCH Testing access Image

Runway

Consumer-friendly video benchmark.

WATCH Credits Video

Luma AI

Worth tracking for motion quality.

WATCH Credits Video

OpenRouter Zero-Cost Roster

This block is designed for daily refresh from `inferencer::list_openrouter_models()`.

cognitivecomputations 1 free models

cognitivecomputations/dolphin-mistral-24b-venice-edition:free

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an "uncensored" instruct-tuned LLM, preserving...

Context 32768 texttext->text
google 6 free models

google/gemma-4-26b-a4b-it:free

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference - delivering near-31B quality at...

Context 262144 imagetextvideotext+image+video->text

google/gemma-4-26b-a4b-it:free

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference - delivering near-31B quality at...

Context 262144 imagetextvideotext+image+video->text

google/gemma-4-31b-it:free

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Context 262144 imagetextvideotext+image+video->text

google/gemma-4-31b-it:free

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Context 262144 imagetextvideotext+image+video->text

google/lyria-3-clip-preview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

Context 1.048576e+06 textimageaudiotext+image->text+audio

google/lyria-3-pro-preview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

Context 1.048576e+06 textimageaudiotext+image->text+audio
liquid 4 free models

liquid/lfm-2.5-1.2b-instruct:free

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Context 32768 texttext->text

liquid/lfm-2.5-1.2b-instruct:free

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Context 32768 texttext->text

liquid/lfm-2.5-1.2b-thinking:free

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG-while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Context 32768 texttext->text

liquid/lfm-2.5-1.2b-thinking:free

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG-while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Context 32768 texttext->text
meta-llama 3 free models

meta-llama/llama-3.2-3b-instruct:free

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Context 131072 texttext->text

meta-llama/llama-3.3-70b-instruct:free

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context 131072 texttext->text

meta-llama/llama-3.3-70b-instruct:free

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context 131072 texttext->text
nex-agi 1 free models

nex-agi/nex-n2-pro:free

Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...

Context 262144 textimagetext+image->text
nousresearch 2 free models

nousresearch/hermes-3-llama-3.1-405b:free

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Context 131072 texttext->text

nousresearch/hermes-3-llama-3.1-405b:free

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Context 131072 texttext->text
nvidia 13 free models

nvidia/nemotron-3-nano-30b-a3b:free

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Context 256000 texttext->text

nvidia/nemotron-3-nano-30b-a3b:free

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Context 256000 texttext->text

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

NVIDIA NemotronTM 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Context 256000 textaudioimagevideotext+image+audio+video->text

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

NVIDIA NemotronTM 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Context 256000 textaudioimagevideotext+image+audio+video->text

nvidia/nemotron-3-super-120b-a12b:free

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Context 1e+06 texttext->text

nvidia/nemotron-3-super-120b-a12b:free

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Context 1e+06 texttext->text

nvidia/nemotron-3-ultra-550b-a55b:free

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Context 1e+06 texttext->text

nvidia/nemotron-3-ultra-550b-a55b:free

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Context 1e+06 texttext->text

nvidia/nemotron-3.5-content-safety:free

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...

Context 128000 textimagetext+image->text

nvidia/nemotron-nano-12b-v2-vl:free

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba's...

Context 128000 imagetextvideotext+image+video->text

nvidia/nemotron-nano-12b-v2-vl:free

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba's...

Context 128000 imagetextvideotext+image+video->text

nvidia/nemotron-nano-9b-v2:free

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Context 128000 texttext->text

nvidia/nemotron-nano-9b-v2:free

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Context 128000 texttext->text
openai 4 free models

openai/gpt-oss-120b:free

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context 131072 texttext->text

openai/gpt-oss-120b:free

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context 131072 texttext->text

openai/gpt-oss-20b:free

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Context 131072 texttext->text

openai/gpt-oss-20b:free

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Context 131072 texttext->text
openrouter 2 free models

openrouter/free

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...

Context 200000 textimagetext+image->text

openrouter/owl-alpha

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....

Context 1.048756e+06 texttext->text
poolside 2 free models

poolside/laguna-m.1:free

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...

Context 262144 texttext->text

poolside/laguna-xs.2:free

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...

Context 262144 texttext->text
qwen 4 free models

qwen/qwen3-coder:free

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

Context 1.048576e+06 texttext->text

qwen/qwen3-coder:free

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

Context 1.048576e+06 texttext->text

qwen/qwen3-next-80b-a3b-instruct:free

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without "thinking" traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Context 262144 texttext->text

qwen/qwen3-next-80b-a3b-instruct:free

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without "thinking" traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Context 262144 texttext->text

Operator Notes

This page should be refreshed daily because free-model availability changes quickly.

Category-aware free-model counts are derived from inferencer's OpenRouter category wrappers, while the provider boards remain manually curated.

The OpenRouter section is machine-generated; the provider boards are intentionally curated.

Use the dashboard as a monitor, not as a compliance promise. Vendor free-tier terms can change without warning.