LLM Optimization

Spectral Therapy for Transformers: Predicting Divergence Before It Hurts

Training failure has a special talent for arriving late. Not late in the philosophical sense. Late in the operational sense: after the run has already consumed GPU time, after the team has already waited, after the dashboard has already looked tolerable long enough to invite optimism. Then the loss spikes, the gradient norm goes feral, and everyone pretends this was “useful learning.” Sometimes it is. Often it is just expensive smoke. ...

Speculation, But With Standards: Training Draft Models That Actually Get Accepted

Queue. That is still the least glamorous word in AI infrastructure, and probably the most honest one. A user asks a model to write code, summarize a filing, inspect an image, or reason through a customer ticket. The model knows what to do, more or less. The bottleneck is not ambition. It is waiting: one token after another, one expensive forward pass after another, while the GPU performs a very sophisticated version of typing slowly. ...

When Models Forget on Purpose: Why Data Selection Matters More Than Data Volume

Training data has become the AI industry’s favorite comfort blanket. When performance stalls, add more tokens. When a benchmark looks stubborn, add more tokens. When the model behaves badly, add more tokens and call it a roadmap. This worked well enough to become a reflex. Unfortunately, reflexes are not strategies. The uncomfortable question is no longer whether data matters. Of course it matters. The better question is whether every token deserves the same vote during training. ...

Fast Minds, Cheap Thinking: How Predictive Routing Cuts LLM Reasoning Costs

A support ticket arrives. Then a compliance question. Then a spreadsheet formula request. Then a genuinely nasty piece of mathematical reasoning wearing the innocent expression of a homework problem. In too many AI systems, all four get sent to the same expensive reasoning model, because the architecture has the subtlety of a hotel buffet: everything goes through the same line. ...