Benchmarks Without Borders: Inside the Moduli Space of AI Psychometrics
Procurement Has a Benchmark Problem Procurement teams love benchmark tables. They are clean, sortable, and emotionally comforting. Vendor A beats Vendor B by 3.7 points on a reasoning suite; Vendor C wins on code generation; Vendor D claims better tool use under “realistic agent workflows,” a phrase that usually means someone added a browser, a calculator, and optimism. ...