Cover image

Rank and File: AI Leaderboards Are Measurement Instruments, Not Scoreboards

Procurement meetings have a familiar ritual now. Someone opens a leaderboard, sorts by average score, points at a model near the top, and asks why the company is not using that one. It feels empirical. It is neatly ranked. It has decimals. Very scientific-looking decimals, the most seductive species of decimal. The problem is not that leaderboards are useless. The problem is that we often treat them as scoreboards when they are closer to measurement instruments. A scoreboard tells us who won under agreed rules. A measurement instrument first has to prove that it measures the thing it claims to measure. If the instrument mixes model size, benchmark difficulty, contributor practices, post-training choices, item redundancy, and residual artifacts into one number, then the number may still be useful. It is just not self-explanatory. ...

June 4, 2026 · 18 min · Zelina