Model Compression

Unsafe at Any Bit: Patching the Safety Gaps in Quantized LLMs

TL;DR for operators Quantizing an LLM is not a harmless cost-saving step. It changes the model, and the paper analysed here shows that those changes can weaken safety even when familiar utility scores still look respectable. That is the uncomfortable part: the dashboard can say “performance preserved” while the model has become more willing to comply with harmful requests. Very efficient. Very modern. Very easy to miss. ...

The Outlier Is a Lie: Quantization Breakthroughs with OSP

TL;DR for operators If your deployment plan depends on squeezing a language model into cheap inference hardware, this paper is worth reading because it changes the timing of the quantization problem. Most quantization work asks: “How do we repair a model after training so it survives 4-bit inference?” Outlier-Safe Pre-Training asks a more irritating question: “Why did we train a quantization-hostile model in the first place?”1 ...