The Outlier Is a Lie: Quantization Breakthroughs with OSP
TL;DR for operators If your deployment plan depends on squeezing a language model into cheap inference hardware, this paper is worth reading because it changes the timing of the quantization problem. Most quantization work asks: “How do we repair a model after training so it survives 4-bit inference?” Outlier-Safe Pre-Training asks a more irritating question: “Why did we train a quantization-hostile model in the first place?”1 ...