LLM Routing

The Edge Case for LLM Routing: Why Cheap Local Inference Needs a Risk Gate

Phone. That is the simplest way to understand the problem. Not “AI infrastructure,” not “distributed inference,” not the usual diagram where a cloud box smiles down upon a client device. A phone receives a query. It must decide whether to answer locally or send the request to an edge server. Once it answers locally, the decision is done. There is no elegant after-the-fact escalation. The stronger model it did not call remains unused, quietly judging from the rack. ...

EcoThink: When AI Learns to Think Less (and Achieve More)

A chatbot does not need a philosophy seminar to answer “Who directed Oppenheimer?” That sentence sounds obvious. Yet a large part of today’s AI infrastructure behaves as if every user query deserves a carefully staged internal drama: retrieve facts, reason through them, verify the logic, produce a chain of intermediate steps, and finally deliver the answer the system could have produced with a simple lookup. It is impressive in the same way using a crane to move a coffee cup is impressive. Technically capable. Operationally absurd. ...

Right Tool, Right Thought: Difficulty-Aware Orchestration for Agentic LLMs

Tickets are not equal. Some user requests are glorified form-filling. Some are ambiguous investigations with missing context, tool calls, intermediate checks, and enough failure modes to keep a compliance officer quietly blinking at the ceiling. Yet many agentic systems still behave as if every query deserves the same ritual: summon the agents, run the workflow, pass outputs around, maybe add a debate round for theatrical effect, and hope the bill does not look too much like modern art. ...

The Slingshot Strategy: Outsmarting Giants with Small AI Models

TL;DR for operators Most organisations do not have an AI capability problem. They have an AI allocation problem. They send too many routine, repetitive, low-risk tasks to large frontier models because the demo looked impressive and the invoice arrived later. The slingshot strategy is the opposite instinct: break a workflow into smaller decisions, assign the cheap and reliable parts to specialised models or rules, and escalate only the uncertain or high-value cases to stronger LLMs. The point is not to worship small models. That would be merely replacing one superstition with a smaller, cheaper superstition. The point is to allocate model capacity like an operating resource. ...