The Model Got Smaller. The Risk Got Wider.
TL;DR for operators Compression is usually sold as a clean engineering bargain: smaller model, lower memory, cheaper inference, acceptable accuracy loss. This paper asks the more operationally annoying question: after compression, does the model still know when it should hedge? The answer is: not reliably. Tong et al. benchmark compressed LLMs using conformal prediction, a framework that converts model probabilities into prediction sets with target coverage.1 In this setup, the important uncertainty metric is prediction set size: if the model needs to include more answer options to maintain coverage, it is less certain, even if its top-1 accuracy still looks respectable. ...