Automation

Simulate This: When LLMs Stop Talking and Start Modeling

A simulation model is not a chatbot with a spreadsheet attached. That sounds obvious until a project team starts treating the LLM as if it were the entire modeling stack: the analyst, the programmer, the validator, the documentation clerk, the statistical package, and occasionally the intern blamed when the result changes on Tuesday. The convenient story is that better prompting will tame the system. Add more examples. Add a RAG. Set temperature to zero. Smile at the demo. ...

When Models Guess the Verb by Looking at the Drawer

Drawer. That is the easy part. A model sees a drawer, and it knows that drawers are often opened. Then it watches a video where someone is closing the drawer and predicts opening anyway. This is not the kind of error that makes a demo look silly for five seconds and then disappear into the benchmark appendix. It is the kind of error that reveals what the system is really using as evidence. The model is not necessarily watching the motion. It may be recognizing the object, remembering the most common verb attached to that object during training, and calling that “video understanding.” Very efficient. Also wrong. ...

Cosmos Policy: When Video Models Stop Watching and Start Acting

A robot in a factory does not need a beautiful video of itself almost doing the job. It needs the gripper to close at the right moment, the wrist to rotate by the right amount, and the next two seconds of motion not to turn a simple pick-and-place task into modern sculpture. This is where many foundation-model stories become less glamorous. Vision-language models can recognize the scene. Video models can imagine motion. Neither of those achievements automatically gives you a usable control policy. ...

Probe, Then Commit: Why Solver Tuning Finally Grew Up

Probe, Then Commit: Why Solver Tuning Finally Grew Up Planning is where business software goes to meet reality. A factory needs a schedule. A logistics team needs routes. A utility company needs network decisions. A hospital needs staff allocation. The model is elegant, the constraints are clear, and then the solver quietly asks the question nobody put in the PowerPoint: ...

When Goals Collide: Synthesizing the Best Possible Outcome

A robot does not always get the luxury of a clean task list. Reach the loading bay. Avoid blocked corridors. Preserve battery. Pick up two packages. Respect a safety boundary. Finish before the door closes. Then the environment, as environments enjoy doing, changes the rules halfway through. A corridor shuts. A resource disappears. One goal now interferes with another. ...

NPCs With Short-Term Memory Loss: Benchmarking Agents That Actually Live in the World

Minecraft is not the point. That may sound rude to the blocks, but it is the cleanest way to read MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents.1 The paper does use Minecraft. It does study an AI companion agent inside a live game world. It does report that a GPT-4o-powered setup failed on 71 out of 216 attempted subtasks, or roughly one-third of the subtask denominator. ...

Echoes, Not Amnesia: Teaching GUI Agents to Remember What Worked

Memory is not a folder A useful employee does not fill out the same form from scratch every morning as if yesterday never happened. They remember which menu hides the export button, which warning can be ignored, which field must be filled before the “Next” button wakes up, and which apparently harmless click sends the process into a small bureaucratic swamp. ...

When Rewards Learn to See: Teaching Humanoids What the Ground Looks Like

Robots do not fall because the word “walk” is ambiguous. They fall because the ground has opinions. A flat floor, a gap, a pile of blocks, and a staircase may all ask for “locomotion,” but they do not ask for the same behavior. One asks for velocity tracking. Another asks for foot placement. Another punishes careless exploration. A staircase, because it has a flair for drama, asks the robot to negotiate gravity one step at a time. ...

Prompt-to-Parts: When Language Learns to Build

The compiler is the interesting part Blocks are easy to understand. That is why this paper is more interesting than it first looks. At the surface, Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions is a paper about using large language models to generate LEGO-style assemblies from natural language prompts.1 It shows a medieval castle, an International Space Station model, a modular multitool kit, and an image-to-parts helicopter conversion. Naturally, the tempting summary is: “LLMs can now design LEGO models.” ...

ImplicitRDP: When Robots Stop Guessing and Start Feeling

Robots are very good at looking confident. Put a camera on a robot arm, train it with enough demonstrations, and it may glide toward a box, a switch, or a tool with the calm precision of something that understands the world. Then contact happens. The fingertip presses too hard. The switch has not actually toggled. The object slips, bends, jams, or quietly enters the expensive category known as “damaged inventory.” ...