Draft Models

Queue. That is still the least glamorous word in AI infrastructure, and probably the most honest one. A user asks a model to write code, summarize a filing, inspect an image, or reason through a customer ticket. The model knows what to do, more or less. The bottleneck is not ambition. It is waiting: one token after another, one expensive forward pass after another, while the GPU performs a very sophisticated version of typing slowly. ...