Embodied-Ai

Walking the Line: When Robots Learn to Step Like Humans (Without the Drama)

Walking looks easy until you ask a robot to do it. For humans, stepping over a box or climbing a stair is usually not an executive decision. The body sees the surface, estimates where the foot should land, keeps rhythm, adjusts weight, and moves on. No committee meeting. No multi-stage training pipeline. No adversarial discriminator whispering, “that gait is not sufficiently human-like.” ...

When Memory Lies and Rules Save It: Rethinking LLM Agents in Closed Worlds

Memory is usually sold as the adult upgrade for LLM agents. Give the agent a past. Give it a vector database. Give it episodes, reflections, mistakes, summaries, and a long enough context window to remember every tiny embarrassment. Surely it will become more reliable. The RPMS paper is useful because it interrupts that comforting story with a less fashionable point: memory can make an agent worse when the world has hard action rules.1 ...

Learning From the Punches: How AI Agents Turn Mistakes into Skills

Mistakes are cheap until an agent repeats them. A human worker who keeps failing at the same task usually leaves traces: a blocked aisle, a missing tool, a wrong form field, an error message, a process exception. A competent manager does not simply tell the worker to “try again with more confidence.” The useful move is more boring and more valuable: identify the pattern, write the repair rule, and make sure the next attempt starts from the point of failure rather than from the beginning. ...

When Models Get Lost in Space: Why MLLMs Still Fail Geometry

Geometry looks clean. A cube has edges. A projection has rules. A missing view should follow from the views already shown. This is not the messy world of occluded street scenes, motion blur, shadows, or a warehouse camera pointed at the wrong shelf. It is the kind of visual reasoning many students learn before they are trusted with anything more dangerous than a compass, a ruler, and mild boredom. ...

Stable World Models, Unstable Benchmarks: Why Infrastructure Is the Real Bottleneck

A robot does not fail politely. It does not say, “I was trained on a slightly different shade of blue.” It just misses the object, pushes the wrong way, or confidently follows a plan that only works in the tidy little universe where the benchmark was born. That is the uncomfortable lesson behind stable-worldmodel-v1, a paper that is less about inventing a new world model and more about asking whether world-model research has been measuring the right thing in the first place.1 ...

Benchmarks Lie, Rooms Don’t: Why Embodied AI Fails the Moment It Enters Your House

The room is not impressed by your leaderboard A robot that performs well on a public benchmark has not necessarily learned how to operate in your house. It may recognize a chair in a dataset. It may answer a visual question about a tidy image. It may even produce a confident paragraph explaining where the coffee mug should be. Then it enters a real room — with mirrors, partial views, cluttered corners, awkward sightlines, and objects that are not positioned for benchmark convenience — and suddenly the “general intelligence” starts behaving like a tourist holding the map upside down. ...

Seeing Is Thinking: When Images Do the Reasoning

Paper is a good trap for artificial intelligence. Fold it, punch it, unfold it, and ask where the holes are. A person may not solve the problem instantly, but the mind knows what to do: imagine the folded sheet opening step by step. The reasoning is not mainly verbal. We do not narrate every cell of the paper grid like a bored accountant reading inventory codes. We see the transformation. ...

MemCtrl: Teaching Small Models What Not to Remember

MemCtrl: Teaching Small Models What Not to Remember A robot assistant walks through a room. It sees a chair from the front. Then from the side. Then from a slightly worse angle. Then the same chair again, because the camera moved while the robot hesitated. In theory, all of this is “context.” In practice, it is mostly noise wearing a productivity badge. ...

Cosmos Policy: When Video Models Stop Watching and Start Acting

A robot in a factory does not need a beautiful video of itself almost doing the job. It needs the gripper to close at the right moment, the wrist to rotate by the right amount, and the next two seconds of motion not to turn a simple pick-and-place task into modern sculpture. This is where many foundation-model stories become less glamorous. Vision-language models can recognize the scene. Video models can imagine motion. Neither of those achievements automatically gives you a usable control policy. ...

From Talking to Living: Why AI Needs Human Simulation Computation

The chatbot that cannot check the door A useful AI assistant can write an email, summarize a meeting, explain a regulation, or generate a plan for fixing a server problem. Then something inconvenient happens: the real world disagrees. The meeting transcript missed one speaker. The regulation changed in one jurisdiction. The server error was not caused by the code but by two services fighting over the same port. The customer sounded satisfied in the chat log but cancelled the contract two days later. The model can still talk. Beautifully, even. But it cannot always live inside the situation long enough to notice that its first answer has become stale, incomplete, or simply wrong. ...