The Edge Case for LLM Routing: Why Cheap Local Inference Needs a Risk Gate
Phone. That is the simplest way to understand the problem. Not “AI infrastructure,” not “distributed inference,” not the usual diagram where a cloud box smiles down upon a client device. A phone receives a query. It must decide whether to answer locally or send the request to an edge server. Once it answers locally, the decision is done. There is no elegant after-the-fact escalation. The stronger model it did not call remains unused, quietly judging from the rack. ...