Stable World Models, Unstable Benchmarks: Why Infrastructure Is the Real Bottleneck
A robot does not fail politely. It does not say, “I was trained on a slightly different shade of blue.” It just misses the object, pushes the wrong way, or confidently follows a plan that only works in the tidy little universe where the benchmark was born. That is the uncomfortable lesson behind stable-worldmodel-v1, a paper that is less about inventing a new world model and more about asking whether world-model research has been measuring the right thing in the first place.1 ...