Embodied-Ai

When Diffusion Learns How to Open Drawers

A drawer is a small test of whether a generated world is lying. A rendered apartment can look plausible from the camera angle. The sofa is against a wall, the table is centered, the cabinet has a tasteful texture, and the lighting politely pretends that nothing is wrong. Then a robot tries to open a drawer and discovers that the drawer path intersects the bed. Or a chair is placed so close to a cabinet that neither object can actually be used. The scene was visually acceptable. It was operationally useless. ...

When Robots Guess, People Bleed: Teaching AI to Say ‘This Is Ambiguous’

Vial. That is the easy version of the problem. A robot stands near a surgical tray. A person says, “Pass me the vial.” There are two vials. One is harmless. One is not. The robot does not need a better smile, a warmer voice, or a more fluent explanation of how helpful it intends to be. It needs to know that the instruction should not be executed yet. ...

Think First, Grasp Later: Why Robots Need Reasoning Benchmarks

A robot receives a simple instruction: pick up the blue cup. It approaches the blue cup, positions its gripper badly, and knocks the cup over. Another robot moves smoothly, closes its gripper precisely—and picks up the red cup. On the operations dashboard, both attempts may appear under the same pleasantly uninformative label: task failed. ...

Don’t Forget How to Feel: Teaching Motion Models Empathy Without Amnesia

Avatars are easy to make expressive once. That is the boring version of the problem. Give a motion model enough examples of sad walking, angry gesturing, or excited dancing, and it can learn the broad association between text and motion. The harder problem starts later, after the product has already shipped. A game studio adds a new combat animation pack. A VR training company expands from office scenarios to emergency response. A digital-human platform moves from daily-life gestures into sports, performance, musical instruments, and acrobatics. Suddenly “sad” is no longer just a lowered head during walking. It must become a lowered head while jogging, a constrained body during performance, or a professional movement pattern inside a sport. ...

Don’t Tell the Robot What You Know

Directions are easy when both people see the same room. “Move left.” “Go toward the table.” “The apple is beside the sofa.” These are perfectly reasonable instructions if speaker and listener share the same visual world. They become less reasonable when one of them is staring at a wall, cannot see the table, and has no reason to believe the sofa exists. At that point, the problem is no longer navigation. It is epistemology, with furniture. ...

CitySeeker: Lost in Translation, Found in the City

The city does not answer literal questions A person says, “I’m thirsty.” A human does not usually reply, “Please specify whether you require a vending machine, café, convenience store, supermarket, juice shop, water fountain, or bubble tea store.” That would be technically attentive and socially catastrophic. A human looks around, remembers what cities usually contain, infers which places can satisfy the need, and starts walking toward a plausible target. ...

SceneMaker: When 3D Scene Generation Stops Guessing

A chair behind a table is not half a chair A single image can be a very rude input. It shows the front of a room, hides the back of objects, compresses depth into pixels, and then asks a model to produce a coherent 3D scene. The model must decide what the hidden side of a chair looks like, how large the chair is, whether it sits behind the table or intersects with it, and where everything belongs in 3D space. Naturally, when the result looks wrong, we often blame “weak 3D generation.” ...

Suzume-chan, or: When RAG Learns to Sit in Your Hand

A visitor walks into a research demo, a museum gallery, a hospital information corner, or a corporate training booth. The expert is busy. The brochure is dry. The QR code leads to a page nobody wants to read while standing up. The chatbot is available, technically, but it lives behind a screen and feels like another form to be tolerated. ...

Worlds Within Reach: How SIMA 2 Turns Virtual Environments into Training Grounds for Generalist Agents

Games are not toys to an AI lab. They are controlled worlds with messy consequences. A game gives an agent what enterprise software and robotics both struggle to provide at scale: visual ambiguity, delayed goals, menus, navigation, tool use, failure states, and a reset button that does not involve a broken warehouse robot or a furious operations manager. That is why Google DeepMind’s SIMA 2 paper is more interesting than “AI can play games again.” We have had that headline several times. It is getting a little tired, and it should probably hydrate. ...

Debate Club for Robots: How Multi-Agent Arguing Makes Embodied AI Safer

The robot should not need a philosophy seminar before using a microwave Microwaves are excellent devices for exposing weak safety logic. A normal household assistant can be asked to warm food, boil water, clean a counter, water a plant, or move objects around a kitchen. Most of these tasks are harmless. Some are not. “Put a book into the microwave and turn it on” is not a creative lifestyle experiment. It is a fire hazard with better lighting. ...