AI Agents

Guardrails Over Gigabytes: Making LLM Coding Agents Behave

Opening — Why this matters now AI coding agents are everywhere—and still, maddeningly unreliable. They pass unit tests they shouldn’t. They hallucinate imports. They invent APIs with confidence that would be admirable if it weren’t so destructive. The industry response has been predictable: bigger models, longer prompts, more retries. This paper proposes something less glamorous and far more effective: stop asking stochastic models to behave like deterministic software engineers. ...

Traffic, but Make It Agentic: When Simulators Learn to Think

Opening — Why this matters now Traffic simulation has always promised more than it delivers. City planners, transport researchers, and policymakers are told that with the right simulator, congestion can be eased, emissions reduced, and infrastructure decisions made rationally. In practice, most simulators demand deep domain expertise, rigid workflows, and a tolerance for configuration pain that few real-world users possess. ...

Let There Be Light (and Agents): Automating Quantum Experiments

Opening — Why this matters now Quantum optics sits at an awkward intersection: conceptually elegant, mathematically unforgiving, and operationally tedious. Designing even a “classic” experiment often means stitching together domain intuition, optical components, and simulation code—usually in tools that were never designed for conversational exploration. As AI agents move from text completion to task execution, the obvious question emerges: can they design experiments, not just describe them? ...

Memory Over Models: Letting Agents Grow Up Without Retraining

Opening — Why this matters now We are reaching the awkward teenage years of AI agents. LLMs can already do things: book hotels, navigate apps, coordinate workflows. But once deployed, most agents are frozen in time. Improving them usually means retraining or fine-tuning models—slow, expensive, and deeply incompatible with mobile and edge environments. The paper “Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM” takes a blunt stance: continual agent improvement should not depend on continual model training. Instead, evolution should happen where operating systems have always handled adaptation best—memory. ...

Shaking the Stack: Teaching Seismology to Talk Back

Opening — Why this matters now Scientific software has a strange tradition: world‑class physics wrapped in workflows that feel frozen in the 1990s. Seismology is no exception. SPECFEM — arguably the gold standard for seismic wave simulation — delivers extraordinary numerical fidelity, but only after users survive a rite of passage involving fragile text files, shell scripts, and MPI incantations. ...

When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior

Opening — Why this matters now Agentic AI systems are everywhere—self-refining copilots, multi-step reasoning chains, autonomous research bots quietly talking to themselves. Yet beneath the productivity demos lurks an unanswered question: what actually happens when an LLM talks to itself repeatedly? Does meaning stabilize, or does it slowly dissolve into semantic noise? The paper “Dynamics of Agentic Loops in Large Language Models” offers an unusually rigorous answer. Instead of hand-waving about “drift” or “stability,” it treats agentic loops as discrete dynamical systems and analyzes them geometrically in embedding space. The result is less sci‑fi mysticism, more applied mathematics—and that’s a compliment fileciteturn0file0. ...

Forget Me Not: How IterResearch Rebuilt Long-Horizon Thinking for AI Agents

Opening — Why this matters now The AI world has become obsessed with “long-horizon” reasoning—the ability for agents to sustain coherent thought over hundreds or even thousands of interactions. Yet most large language model (LLM) agents, despite their size, collapse under their own memory. The context window fills, noise piles up, and coherence suffocates. Alibaba’s IterResearch tackles this problem not by extending memory—but by redesigning it. ...

Touch Intelligence: How DigiData Trains Agents to Think with Their Fingers

Opening — Why this matters now In 2025, AI agents are no longer confined to text boxes. They’re moving across screens—scrolling, tapping, and swiping their way through the digital world. Yet the dream of a truly general-purpose mobile control agent—an AI that can use your phone like you do—has remained out of reach. The problem isn’t just teaching machines to see buttons; it’s teaching them to understand intent. ...

Thinking Fast and Flowing Slow: Real-Time Reasoning for Autonomous Agents

Opening — Why this matters now AI agents are getting smarter—but not faster. Most large language model (LLM) systems still behave like cautious philosophers in a chess match: the world patiently waits while they deliberate. In the real world, however, traffic lights don’t freeze for an AI car mid-thought, and market prices don’t pause while a trading agent reasons about “the optimal hedge.” The new study Real-Time Reasoning Agents in Evolving Environments by Wen et al. (2025) calls this out as a fundamental flaw in current agent design—and offers a solution that blends human-like intuition with deliberative reasoning. ...

Agents on the Clock: How TPS-Bench Exposes the Time Management Problem in AI

Opening — Why this matters now AI agents can code, search, analyze data, and even plan holidays. But when the clock starts ticking, they often stumble. The latest benchmark from Shanghai Jiao Tong University — TPS-Bench (Tool Planning and Scheduling Benchmark) — measures whether large language model (LLM) agents can not only choose the right tools, but also use them efficiently in multi-step, real-world scenarios. The results? Let’s just say most of our AI “assistants” are better at thinking than managing their calendars. ...