World-Models

The Patient Is Not a Moving Document: Why Clinical AI Needs World Models

A patient chart looks like a document because hospitals make it look that way. There are notes, medication lists, lab panels, procedure codes, imaging references, adverse events, survival outcomes, and enough timestamps to make a database administrator feel briefly useful. So it is tempting to treat the electronic health record as a very long piece of text: serialize the events, train a model to predict the next token, extract an embedding, and hope that clinical meaning emerges somewhere inside the transformer fog. ...

World Models Meet the Office From Hell

Office software has a special talent: it says “success” at the exact moment something has gone wrong somewhere else. A ticket is updated. A role is assigned. An asset is transferred. The API returns a cheerful confirmation. The agent, bless its silicon heart, declares victory. Then a background workflow fires. A user’s clearance changes. Another workflow reacts to that clearance change. A different record is silently updated. A constraint is now violated. The agent does not notice, because the agent saw the office equivalent of a green checkmark and mistook it for reality. ...

Cosmos Policy: When Video Models Stop Watching and Start Acting

A robot in a factory does not need a beautiful video of itself almost doing the job. It needs the gripper to close at the right moment, the wrist to rotate by the right amount, and the next two seconds of motion not to turn a simple pick-and-place task into modern sculpture. This is where many foundation-model stories become less glamorous. Vision-language models can recognize the scene. Video models can imagine motion. Neither of those achievements automatically gives you a usable control policy. ...

Lost Without a Map: Why Intelligence Is Really About Navigation

Lost Without a Map: Why Intelligence Is Really About Navigation Map. That is the word most AI product teams should probably put above their dashboards, agent logs, evaluation suites, and occasionally their office coffee machine. Not because maps are poetic. Because when an AI system fails in a live workflow, the failure often does not look like “the model forgot a fact.” It looks like the system was navigating the wrong space. ...

Knowing Is Not Doing: When LLM Agents Pass the Task but Fail the World

A task is finished. The agent found the file, clicked the button, moved the object, submitted the form, or reached the winning state. The dashboard turns green. Everyone relaxes. That is usually the moment when the real question gets quietly buried: what did the agent actually learn about the world it just operated in? ...

Stuck on Repeat: When Reinforcement Learning Fails to Notice the Rules Changed

A dashboard still looks the same after the business changes. The buttons are in the same place. The form fields have the same labels. The workflow still asks for the same approval, the same handoff, the same final action. From the outside, nothing has moved. Then the rules underneath change. A supplier starts behaving differently after a policy shift. A trading market reacts differently after a liquidity regime changes. A robot arm keeps seeing the same objects, but the hardware has worn slightly. A customer-service automation still receives the same message types, but the escalation logic behind the organization has quietly changed. ...

MobileDreamer: When GUI Agents Stop Guessing and Start Imagining

A phone screen is not difficult because it is visually beautiful. It is difficult because it keeps changing. Tap the wrong button, and a form disappears. Scroll too far, and the useful item vanishes below the fold. Open the wrong menu, and the agent spends the next three steps politely recovering from its own confidence. Anyone who has watched a GUI agent operate a mobile app has seen the pattern: it often looks competent right until the interface asks for a small amount of foresight. ...

The Web, Reimagined as a World Model

Checkout should be boring. A customer adds an item to a cart, applies a valid discount, pays the displayed amount, and receives the product that inventory records said was available. This is not an area where an imaginative AI assistant should decide that loyalty deserves a 70% discount, that an empty warehouse contains one final box, or that payment is optional because the customer asked nicely. ...

Think Fast, Act Faster: How 'Thinking-by-Doing' Is Rewiring LLM World Models

Feedback is addictive. Give an AI agent a tool, an API, a database, a browser, a simulator, or a workflow environment, and the temptation is obvious: let it keep poking the world until something works. It tries. It observes. It corrects. It tries again. Compared with a model sitting alone in a prompt box, imagining every possible transition in its head, this looks much healthier. Less hallucinated planning, more contact with reality. Very grown-up. ...

Game of Cones: How Physics Codes Could Fix Agent Reasoning

Controls are where agent intelligence goes to embarrass itself. Give a vision-language model a game frame, a goal, and a list of legal buttons. It may describe the scene beautifully. It may explain that the projectile is approaching, the platform is unstable, and the shiny object is probably a reward. Then it presses the wrong key, late, for the wrong duration, and walks heroically into danger. Excellent commentary. Poor organism. ...