Cover image

The Missing Metric: Measuring Agentic Potential Before It’s Too Late

The Missing Metric: Measuring Agentic Potential Before It’s Too Late Procurement teams love a leaderboard. It is tidy, numeric, comparable, and therefore dangerously comforting. A model scores well on MMLU, looks respectable on GSM8K, passes a coding benchmark, and suddenly someone in a meeting says it is “agent-ready.” Lovely. By that logic, a person who passes a written driving test should be handed the keys to a forklift in a crowded warehouse. ...

November 2, 2025 · 15 min · Zelina
Cover image

The Outlier Is a Lie: Quantization Breakthroughs with OSP

TL;DR for operators If your deployment plan depends on squeezing a language model into cheap inference hardware, this paper is worth reading because it changes the timing of the quantization problem. Most quantization work asks: “How do we repair a model after training so it survives 4-bit inference?” Outlier-Safe Pre-Training asks a more irritating question: “Why did we train a quantization-hostile model in the first place?”1 ...

June 25, 2025 · 18 min · Zelina