Cover image

Rotate Less, Quantize Better: OptRot and the Geometry of LLM Compression

Opening — Why this matters now Quantization is no longer a niche optimization; it is the price of admission for deploying large language models at scale. As model sizes balloon and inference budgets stubbornly refuse to follow, post-training quantization (PTQ) has become the default survival strategy. Yet one stubborn problem keeps resurfacing: outliers. A handful of extreme weights—or activations—can quietly wreck an otherwise elegant low‑bit deployment. This paper introduces OptRot, a method that tackles that problem not with more data, more calibration, or more training, but with something almost suspiciously modest: a carefully chosen rotation objective. ...

January 3, 2026 · 4 min · Zelina