Cognaptus DataHub Monitor

Free AI Inference Providers

A daily dashboard for monitoring free AI inference providers, with curated vendor boards and a machine-refreshable OpenRouter free-model roster.

Updated 2026-04-29 07:00:03 +0800 Source: Openrouter

Providers Tracked

24

OpenRouter Free Models

33

Provider Families

15

Multimodal Free Models

11

Coding-Friendly Free Models

4

Capability Signals

Text 33 / Image 11 / Audio 3 / Video 4

Why This Page Exists

Free inference capacity is fragmented. Stable vendor free tiers, rotating cloud quotas, and zero-cost OpenRouter routes need different monitoring logic.

This page is now structured as a monitor rather than a one-off article.

The daily job should separate three layers:

  1. Direct model vendors with relatively stable free quotas.
  2. Inference clouds where free capacity rotates more often.
  3. OpenRouter zero-cost models, which should be refreshed automatically from inferencer.

The goal is not just to list providers. It is to help you answer a daily operating question: which free routes are realistically usable right now for text, coding, multimodal, image, speech, and video workloads?

Direct foundation-model vendors

Companies that operate their own API and offer some free or free-trial inference access.

Google AI Studio

Most broadly usable direct free tier.

ACTIVE Free daily quota Text, image, audio, video

GroqCloud

Fastest developer-facing free path for open models.

ACTIVE Free inference tier Text, vision, speech

Cerebras Cloud

Useful as a high-throughput backup route.

ACTIVE Free quota Text

Cohere

Useful for business NLP tooling.

WATCH Developer quota Text, embeddings, rerank

Mistral

Strong open-weight ecosystem.

WATCH Trial credits / limited free access Text, embeddings

DeepSeek

Monitor for changes in API trial policy.

WATCH Free web usage and low-cost API Text, reasoning

MiniMax

Relevant for agent and media workflows.

WATCH Selective free access Text, speech, video

Moonshot / Kimi

Worth tracking for long-context offerings.

WATCH Selective free access Text

Inference clouds and gateways

Multi-model platforms where free capacity changes often and is worth checking daily.

OpenRouter

Primary daily monitor source via inferencer.

ACTIVE Zero-cost models rotate daily Text, vision, audio, video

Hugging Face Inference

Largest long-tail open-model surface.

ACTIVE Free usage quota Text, image, speech, embeddings

Together AI

Good backup for open-weight text and diffusion.

WATCH Free quota / credits Text, image

Cloudflare Workers AI

Edge inference is strategically distinct.

WATCH Bundled free allocation Text, image, speech

Fireworks AI

Useful for comparing open-model economics.

WATCH Credits / selected free access Text, image

Baseten

More deployment-oriented than general free inference.

WATCH Credits Model hosting

Replicate

Best tracked for specialized modalities.

WATCH Credits Image, video, audio

Fal.ai

Media generation prices move quickly.

WATCH Credits / trials Image, video

Modal

More infra than gateway, but still relevant.

WATCH Credits Model hosting

Specialized modality APIs

Providers best monitored by capability rather than by general LLM coverage.

ElevenLabs

Important benchmark for TTS.

WATCH Starter quota Speech

Deepgram

Useful ASR baseline.

WATCH Trial / starter credits Speech

AssemblyAI

Speech-first provider.

WATCH Credits Speech

Stability AI

Core diffusion benchmark.

WATCH Credits Image

Black Forest Labs

Track FLUX availability changes.

WATCH Testing access Image

Runway

Consumer-friendly video benchmark.

WATCH Credits Video

Luma AI

Worth tracking for motion quality.

WATCH Credits Video

OpenRouter Zero-Cost Roster

This block is designed for daily refresh from `inferencer::list_openrouter_models()`.

baidu 1 free models

baidu/qianfan-ocr-fast:free

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.

Context 65536 imagetexttext+image->text
cognitivecomputations 1 free models

cognitivecomputations/dolphin-mistral-24b-venice-edition:free

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

Context 32768 texttext->text
google 9 free models

google/gemma-3-12b-it:free

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Context 32768 textimagetext+image->text

google/gemma-3-27b-it:free

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Context 131072 textimagetext+image->text

google/gemma-3-4b-it:free

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Context 32768 textimagetext+image->text

google/gemma-3n-e2b-it:free

Gemma 3n E2B IT is a multimodal, instruction-tuned model developed by Google DeepMind, designed to operate efficiently at an effective parameter size of 2B while leveraging a 6B architecture. Based...

Context 8192 texttext->text

google/gemma-3n-e4b-it:free

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

Context 8192 texttext->text

google/gemma-4-26b-a4b-it:free

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Context 262144 imagetextvideotext+image+video->text

google/gemma-4-31b-it:free

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Context 262144 imagetextvideotext+image+video->text

google/lyria-3-clip-preview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

Context 1.048576e+06 textimageaudiotext+image->text+audio

google/lyria-3-pro-preview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

Context 1.048576e+06 textimageaudiotext+image->text+audio
inclusionai 2 free models

inclusionai/ling-2.6-1t:free

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...

Context 262144 texttext->text

inclusionai/ling-2.6-flash:free

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....

Context 262144 texttext->text
liquid 2 free models

liquid/lfm-2.5-1.2b-instruct:free

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Context 32768 texttext->text

liquid/lfm-2.5-1.2b-thinking:free

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Context 32768 texttext->text
meta-llama 2 free models

meta-llama/llama-3.2-3b-instruct:free

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Context 131072 texttext->text

meta-llama/llama-3.3-70b-instruct:free

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context 65536 texttext->text
minimax 1 free models

minimax/minimax-m2.5:free

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

Context 196608 texttext->text
nousresearch 1 free models

nousresearch/hermes-3-llama-3.1-405b:free

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Context 131072 texttext->text
nvidia 5 free models

nvidia/nemotron-3-nano-30b-a3b:free

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Context 256000 texttext->text

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Context 256000 textaudioimagevideotext+image+audio+video->text

nvidia/nemotron-3-super-120b-a12b:free

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Context 262144 texttext->text

nvidia/nemotron-nano-12b-v2-vl:free

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

Context 128000 imagetextvideotext+image+video->text

nvidia/nemotron-nano-9b-v2:free

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Context 128000 texttext->text
openai 2 free models

openai/gpt-oss-120b:free

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context 131072 texttext->text

openai/gpt-oss-20b:free

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Context 131072 texttext->text
openrouter 1 free models

openrouter/free

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...

Context 200000 textimagetext+image->text
poolside 2 free models

poolside/laguna-m.1:free

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...

Context 131072 texttext->text

poolside/laguna-xs.2:free

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...

Context 131072 texttext->text
qwen 2 free models

qwen/qwen3-coder:free

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

Context 262000 texttext->text

qwen/qwen3-next-80b-a3b-instruct:free

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Context 262144 texttext->text
tencent 1 free models

tencent/hy3-preview:free

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...

Context 262144 texttext->text
z-ai 1 free models

z-ai/glm-4.5-air:free

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

Context 131072 texttext->text

Operator Notes

This page should be refreshed daily because free-model availability changes quickly.

The OpenRouter section is machine-generated; the provider boards are intentionally curated.

Use the dashboard as a monitor, not as a compliance promise. Vendor free-tier terms can change without warning.