AI Agents

When Policies Read Each Other: Teaching Agents to Cooperate by Reading the Code

A workflow breaks in a familiar way. The planning agent assumes the procurement agent will wait. The procurement agent assumes the planning agent has already revised the forecast. The compliance agent flags the output after both have acted. Everyone had access to the same dashboard. Nobody had access to the thing that actually mattered: the other agent’s decision policy. ...

Traffic, but Make It Agentic: When Simulators Learn to Think

Traffic. A planner wants to test whether a new signal policy will reduce congestion near a hospital. A logistics operator wants to know whether a revised delivery schedule will overload a district during the evening peak. A city team wants to compare two neighborhoods, two time windows, and two control strategies before anyone touches asphalt, paint, or public patience. ...

Agents All the Way Down: When Science Becomes Executable

A lab does not fail because the scientist forgot how to think. It fails more often for duller reasons: the data table is in the wrong format, the simulation script only works on one cluster, the instrument queue is opaque, the boundary condition was changed but not logged, the literature trail cannot be reconstructed, and the “promising result” lives in someone’s notebook like a small hostage. ...

When One Clip Isn’t Enough: Teaching LLMs to Watch Long Videos Like Adults

Video is a terrible place to hide evidence. Not because the evidence is invisible. Because it is usually obvious only after someone has already found the right minute, the right scene, and the right visual detail. A person reviewing a long customer-support screen recording, a training video, a compliance recording, or a surveillance clip rarely watches everything with equal attention. They skim, localize, zoom in, check the detail, and then answer. Primitive, yes. Effective, also yes. ...

When LLMs Stop Guessing and Start Calculating

A simulation job does not care how elegant the prompt was. It cares whether the input files are valid, whether the parameters are compatible, whether the previous step produced the right intermediate state, whether the solver converged, and whether the final number actually means what the workflow says it means. This is where the romance of “AI scientists” usually meets the concrete wall of scientific computing. The model can sound like a postdoc. The machine still wants the correct INCAR tag. ...

About Time: When Reinforcement Learning Finally Learns to Wait

Waiting is a decision. That sounds obvious to anyone who has watched a warehouse robot pause at an intersection, a trading system delay execution, or an autonomous vehicle slow down before a pedestrian crossing. In the real world, “do the task” is rarely the whole instruction. The operational instruction is closer to: do the task, in this order, not before this condition, not after that deadline, and preferably without wasting time while pretending that nothing is happening. ...

Same Moves, Different Minds: Rashomon Comes to Sequential Decision-Making

A taxi is a useful little trap. It looks harmless: pick up passengers, drive them to destinations, do not run out of fuel. A small grid-world taxi environment is not exactly the sort of thing that makes executives whisper “agentic transformation” over terrible conference coffee. But that is precisely why it works. Strip away the enterprise theatre, and sequential decision-making becomes easier to see. An agent observes a state, chooses an action, receives the next state, and repeats. If two agents always make the same moves and achieve the same objective, most organizations would treat them as equivalent. Same behavior, same operational meaning. Audit passed. Ship it. ...

Let There Be Light (and Agents): Automating Quantum Experiments

Let There Be Light (and Agents): Automating Quantum Experiments A lab notebook is not just a diary. It is an institutional memory system with bad handwriting, missing parameter values, and occasional coffee damage. That is not a joke, unfortunately. In experimental science, much of the valuable knowledge sits between formal theory and physical execution: which crystal goes with which pump, how the beams should be routed, which detector timing window is plausible, which old setup can be reused, and which beautiful simulation is quietly lying through its teeth. ...

Memory Over Models: Letting Agents Grow Up Without Retraining

Repetition is where most automation systems quietly embarrass themselves. Ask an AI agent to book a hotel once, and it may inspect the screen, reason through options, click through menus, and eventually finish the task. Ask it to do something similar tomorrow, and many systems perform the same little theatre again: perceive, reason, click, wait, reason, click, apologize, recover. Very intelligent. Very expensive. Slightly absurd. ...

CitySeeker: Lost in Translation, Found in the City

The city does not answer literal questions A person says, “I’m thirsty.” A human does not usually reply, “Please specify whether you require a vending machine, café, convenience store, supermarket, juice shop, water fountain, or bubble tea store.” That would be technically attentive and socially catastrophic. A human looks around, remembers what cities usually contain, infers which places can satisfy the need, and starts walking toward a plausible target. ...