Cover image

Question Banks Are Dead. Long Live Encyclo-K.

Question banks work well until the examinee obtains the question bank. After that, the test still produces scores. It may even produce beautifully precise rankings. What it no longer reliably produces is evidence that the examinee can solve unseen problems. Large-language-model benchmarks face the same awkward lifecycle. A fixed evaluation set is published, discussed, copied into repositories, used in model-development pipelines, and eventually absorbed into training corpora. The benchmark remains visible; its diagnostic value quietly depreciates. ...

January 2, 2026 · 14 min · Zelina