World-Models

When Videos Grow Hands: How PhysWorld Teaches Robots to Stop Hallucinating Physics

Robots are not impressed by nice videos. A generated clip can show a hand placing a book into a shelf, pouring tomatoes from a pan, or sweeping scraps into a dustpan. It can look coherent enough to fool a casual viewer and perhaps even a product demo audience, which is not exactly the highest bar in technology. But a robot does not execute “looks coherent.” It executes poses, contacts, forces, trajectories, collisions, and failures. ...

Mind's Eye for Machines: How SimuRA Teaches AI to Think Before Acting

TL;DR for operators SimuRA is an agent architecture that asks a simple operational question: before an AI agent clicks, searches, filters, submits, or replies, can it cheaply rehearse what might happen next?1 Not in a poetic “the machine imagines” sense, please calm down. In a practical sense: generate candidate actions, simulate their likely outcomes in a compact internal state, score those futures against the goal, and only then execute the first concrete action. ...

SIMURA Says: Don’t Guess, Simulate

TL;DR for operators Most LLM agents still behave like overconfident interns with a browser: observe, guess the next action, click, apologise, repeat. SiRA proposes a more serious pattern. Before acting, the agent writes down a belief state, proposes several high-level candidate actions, simulates likely future states with an LLM-based world model, scores those futures against the goal, and only then converts the selected intent into an executable browser action.1 ...