Question Banks Are Dead. Long Live Encyclo-K.
Opening — Why this matters now Every time a new benchmark is released, the same ritual follows: models race to the top, leaderboards reshuffle, and a few months later—sometimes weeks—we quietly realize the benchmark has been memorized, gamed, or both. The uncomfortable truth is that static questions are no longer a reliable way to measure rapidly evolving language models. ...