Imitation-Learning

Borrowed Hands Still Need a Grip

TL;DR for operators Robot-learning teams do not usually run out of model ideas first. They run out of clean demonstrations on the exact robot, in the exact setup, with the exact action labels needed for behavioural cloning. The paper behind GLAM attacks that bottleneck directly: instead of asking whether cheap auxiliary demonstrations can be thrown into the training pile, it asks whether their effects can be translated into actions the target robot can actually execute.1 ...

Fold Me Once: When the Demonstration Becomes the Robot Interface

TL;DR for operators Instant-Fold is not mainly a “robot folds shirts” paper. That is the demo-friendly surface layer, and robotics papers do need a surface layer. The more useful idea is that a single demonstration can work as an operational interface for deformable tasks where language is too thin, checklists are too brittle, and final-state labels hide the important part: how the object got there.1 ...

Context Is Not a Costume: Why Strong Agents Still Fail on Contact

The agent looks ready. Then reality answers back. The current AI-agent story is conveniently simple. Take a powerful foundation model, wrap it in tools, give it a workflow, add a polite system prompt, and call the result “ready for deployment.” Reality, as usual, has poor manners. Two recent arXiv papers examine very different agent settings. One studies whether multimodal AI agents can align their behavior with the cognitive age of child users. The other studies whether behavior foundation models for imitation learning can remain robust when the physical dynamics of an environment shift after training. They do not share a benchmark, a model class, or even the same deployment domain. That is precisely why they are useful together. ...

ImplicitRDP: When Robots Stop Guessing and Start Feeling

Robots are very good at looking confident. Put a camera on a robot arm, train it with enough demonstrations, and it may glide toward a box, a switch, or a tool with the calm precision of something that understands the world. Then contact happens. The fingertip presses too hard. The switch has not actually toggled. The object slips, bends, jams, or quietly enters the expensive category known as “damaged inventory.” ...

Learning by X-ray: When Surgical Robots Teach Themselves to See in Shadows

X-rays are useful because they are cheap, familiar, and already sitting in the operating room. They are also, inconveniently, shadows. That is the central tension in Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures, a paper that asks whether a robot policy can plan vertebroplasty cannula trajectories from only bi-planar X-ray views—one anterior-posterior view, one lateral view—without CT-based navigation, registration, or a lovingly over-engineered suite of intra-operative infrastructure.1 ...