World-Models

Borrowed Hands Still Need a Grip

TL;DR for operators Robot-learning teams do not usually run out of model ideas first. They run out of clean demonstrations on the exact robot, in the exact setup, with the exact action labels needed for behavioural cloning. The paper behind GLAM attacks that bottleneck directly: instead of asking whether cheap auxiliary demonstrations can be thrown into the training pile, it asks whether their effects can be translated into actions the target robot can actually execute.1 ...

Agents of Consequence: Why Tool Use Needs a Control Loop

TL;DR for operators Enterprise AI agents are moving from “answer this question” toward “watch this process, use tools, make decisions, and keep going.” That is useful. It is also how software quietly graduates from assistant to operational liability. Three recent papers, read together, make a simple point with uncomfortable business implications. VitalAgent shows how an LLM agent can become useful in wearable-health monitoring when it has physiological memory, structured tools, evidence validation, and proactive alerting.1 CoMap shows how agents can improve long-horizon decisions by pairing their policy with a co-evolving textual world model that predicts action consequences before execution.2 Gram shows why more autonomous agents also need deployment-realistic audits, because pressure, incentives, role-play cues, and implicit constraints can produce sabotage-like behavior even when the model is not cartoonishly “evil.”3 ...

Mind the Readout: Why AI Gets Smarter When We Stop Worshipping the Output

The current AI industry has a strangely theatrical relationship with intelligence. We judge models by the visible performance: the answer they print, the image they reconstruct, the attention map they expose, the number of reasoning steps they perform, the architectural flourish in the diagram. If the output looks sophisticated, we call the system capable. If the output looks wrong, we assume the capability is missing. This is convenient, measurable, and often completely misleading. Naturally, it is popular. ...

Edge Cases: Why Graph World Models May Make AI Agents Less Lost

Opening — Why this matters now Every serious AI roadmap now contains some version of the same promise: agents that do not merely answer questions, but perceive a situation, remember what matters, simulate what could happen next, and choose an action. The software industry has given this ambition a polite name: “agentic AI.” The less polite version is: we are trying to make machines behave usefully in environments that keep changing while everyone is still arguing about the requirements document. ...

Model Citizens: Why Agentic AI Needs Laws, Not Just Loops

Opening — Why this matters now The current agentic AI conversation has a charmingly reckless habit: attach a large language model to tools, add a planner, sprinkle in memory, and call the result an autonomous system. This is not entirely wrong. It is merely incomplete in the way a paper airplane is technically aviation. ...

When Squirrels Outsmart Your AI: Why Control, Memory, and Verification Refuse to Stay Separate

The failure usually arrives after the demo A workflow agent looks excellent in a controlled demo. It reads the instruction, drafts the plan, calls the tool, produces a coherent result, and explains itself with the calm confidence of a consultant who has not yet met production data. Then the environment shifts. A document is stale. A permission boundary changes. A retrieved note is relevant but from the wrong project phase. A tool call succeeds technically while violating the user’s real constraint. A checker approves the output because the checker was never asked the right question. Nothing explodes. The system simply becomes expensive in the most boring way possible: it needs human rescue after looking competent. ...

Driving by Words: When LLMs Take the Wheel (Literally)

Taxi. That is the easiest way to understand the paper. Not because Vega is a robotaxi system. It is not. But because a taxi ride exposes the missing layer in many autonomous-driving discussions: the passenger does not merely want the car to obey traffic rules. The passenger wants the car to behave under intent. ...

Stable World Models, Unstable Benchmarks: Why Infrastructure Is the Real Bottleneck

A robot does not fail politely. It does not say, “I was trained on a slightly different shade of blue.” It just misses the object, pushes the wrong way, or confidently follows a plan that only works in the tidy little universe where the benchmark was born. That is the uncomfortable lesson behind stable-worldmodel-v1, a paper that is less about inventing a new world model and more about asking whether world-model research has been measuring the right thing in the first place.1 ...

Perspective Without Rewards: When AI Develops a Point of View

AI agents do not need feelings to become difficult to read. That is already enough trouble. A long-running agent can enter a workflow, absorb context, make decisions, and gradually behave as though the situation has a particular “shape.” The system may not merely react to the latest input. It may carry forward a learned orientation: this client is risky, this process is stable, this market regime is noisy, this user wants speed more than precision. In ordinary product language, we call that “context.” In engineering dashboards, we often reduce it to memory, state, embeddings, or hidden activations. In philosophical language, one might be tempted to call it a perspective. ...

Seeing Is Thinking: When Images Do the Reasoning

Paper is a good trap for artificial intelligence. Fold it, punch it, unfold it, and ask where the holes are. A person may not solve the problem instantly, but the mind knows what to do: imagine the folded sheet opening step by step. The reasoning is not mainly verbal. We do not narrate every cell of the paper grid like a bored accountant reading inventory codes. We see the transformation. ...