House of Cards, House of Algorithms: Why Game AI Needs Better Testbeds
Benchmarks are the places where AI systems go to look impressive. That is not automatically a problem. A good benchmark clarifies what a system can do, what it cannot do, and where progress is real. A bad benchmark performs a more theatrical function: it lets researchers win a carefully chosen game, write a confident conclusion, and quietly hope nobody asks whether the result survives contact with another task. ...