Vision-Language-Action

Bench Press: LabVLA Turns Lab Protocols into Robot Supervision

TL;DR for operators LabVLA is best read as an operating system for laboratory robot supervision, not as another paper claiming the robot scientist has arrived. The authors argue that laboratory automation is constrained by data and embodiment: most vision-language-action models have learned household and tabletop manipulation, but not pipettes, beakers, heaters, transparent liquids, instrument buttons, protocol steps, or the awkward fact that different robots have different bodies.1 ...

Driving by Words: When LLMs Take the Wheel (Literally)

Taxi. That is the easiest way to understand the paper. Not because Vega is a robotaxi system. It is not. But because a taxi ride exposes the missing layer in many autonomous-driving discussions: the passenger does not merely want the car to obey traffic rules. The passenger wants the car to behave under intent. ...

Reasoning Is Optional. Optimization Is Not: Rethinking VLA Training with NORD

Driving teams do not pay for reasoning tokens because they enjoy watching a model narrate its inner life. They pay for them because, at least in current VLA training culture, reasoning traces are treated as a bridge between perception and action. The bridge is expensive. A typical reasoning-heavy Vision-Language-Action pipeline for autonomous driving collects large driving datasets, generates dense chain-of-thought-style annotations, supervised-fine-tunes the model, and then applies reinforcement learning to improve driving metrics. It is a respectable pipeline. It is also the kind of pipeline that quietly converts every research win into an invoice. ...

Think First, Grasp Later: Why Robots Need Reasoning Benchmarks

A robot receives a simple instruction: pick up the blue cup. It approaches the blue cup, positions its gripper badly, and knocks the cup over. Another robot moves smoothly, closes its gripper precisely—and picks up the red cup. On the operations dashboard, both attempts may appear under the same pleasantly uninformative label: task failed. ...

Worlds Within Reach: How SIMA 2 Turns Virtual Environments into Training Grounds for Generalist Agents

Games are not toys to an AI lab. They are controlled worlds with messy consequences. A game gives an agent what enterprise software and robotics both struggle to provide at scale: visual ambiguity, delayed goals, menus, navigation, tool use, failure states, and a reset button that does not involve a broken warehouse robot or a furious operations manager. That is why Google DeepMind’s SIMA 2 paper is more interesting than “AI can play games again.” We have had that headline several times. It is getting a little tired, and it should probably hydrate. ...