Model Routing

Think Twice, Pay Once: The New Economics of Long-Horizon AI Reasoning

Opening — Why this matters now AI reasoning has entered its awkward managerial phase. For the past two years, the dominant story has been simple enough for a conference keynote: make models reason longer, use reinforcement learning, scale inference-time computation, and let the model “think.” The story is not wrong. It is just incomplete in the same way that saying “hire more analysts” is an incomplete operating model for a research department. More thinking can help. It can also become expensive, slow, noisy, and occasionally theatrical. ...

Routing the Brain: Why Smarter LLM Orchestration Beats Bigger Models

Budget is where many agentic AI demos go to become enterprise software. A prototype looks magical when every agent is powered by the strongest available model. The planner plans, the coder codes, the reviewer reviews, the analyst generates charts, and nobody asks why the “simple CSV preview” cost the same kind of model call as a concurrency audit. Then the workflow is run at scale. Suddenly the demo is not an assistant. It is a very polite furnace. ...

Fast Minds, Cheap Thinking: How Predictive Routing Cuts LLM Reasoning Costs

A support ticket arrives. Then a compliance question. Then a spreadsheet formula request. Then a genuinely nasty piece of mathematical reasoning wearing the innocent expression of a homework problem. In too many AI systems, all four get sent to the same expensive reasoning model, because the architecture has the subtlety of a hotel buffet: everything goes through the same line. ...

Model Portfolio: When LLMs Sit the CFA

Exams are useful because they are rude. They do not care that a model sounds polished, cites the right buzzwords, or can produce a gorgeous paragraph about duration risk. They ask for A, B, or C. Then they mark the answer wrong. That is why a new CFA-based benchmark is more useful than another misty-eyed essay about AI “transforming finance.” The paper evaluates GPT-4o, GPT-o1, and o3-mini on 1,560 official CFA mock multiple-choice questions across Levels I, II, and III, both zero-shot and with a domain-reasoning RAG pipeline built from official CFA curriculum materials.1 The result is not a single leaderboard. It is closer to a routing manual. ...

Forecast: Mostly Context with a Chance of Routing

TL;DR for operators Most forecasting teams already have decent numerical forecasters. Their problem is not that ARIMA, ETS, Lag-Llama, Chronos, or internal demand models suddenly forgot how Tuesdays work. The problem is that many important forecast shocks arrive as text: heat-wave notices, maintenance schedules, holiday effects, price caps, promotions, policy changes, store closures, one-off events, and all the other messy little business facts that refuse to fit politely into a clean covariate table. ...

Guess How Much? Why Smart Devs Brag About Cheap AI Models

TL;DR for operators Cheap models are not a moral victory. They are useful when the surrounding system knows what to ask, how to check the answer, and when to escalate. The practical lesson from FrugalGPT and later model-routing research is that AI cost optimisation is less about picking one “best value” model and more about designing an inference pipeline that spends intelligence only where intelligence is needed.1 ...

Break-Even the Machine: Strategic Thinking in the Age of High-Cost AI

TL;DR for operators The real AI cost question is not “Which model is cheapest?” It is “Which workflow delivers acceptable outcomes at the lowest verified total cost?” Token price is only the most visible line item. The less photogenic costs are retries, review, integration, monitoring, compliance, vendor lock-in, and the small corporate tragedy known as “we saved money on inference and spent it all on fixing nonsense.” ...