AI Agents

Driving by Words: When LLMs Take the Wheel (Literally)

Taxi. That is the easiest way to understand the paper. Not because Vega is a robotaxi system. It is not. But because a taxi ride exposes the missing layer in many autonomous-driving discussions: the passenger does not merely want the car to obey traffic rules. The passenger wants the car to behave under intent. ...

Harnessing the Harness: When AI Stops Being a Model Problem

Glue is not glamorous. In most AI product discussions, the model gets the spotlight. The harness—the scripts, prompts, validators, retry rules, state files, tool adapters, and stopping criteria around the model—gets treated as plumbing. Necessary, slightly annoying, and best ignored until it leaks. That habit is becoming expensive. The paper Natural-Language Agent Harnesses argues that the surrounding execution system is no longer a secondary implementation detail. It is often the actual unit of agent performance, reliability, and portability.1 The paper’s useful claim is not that “natural language replaces code.” That would be a lovely fantasy for people who have not debugged parsers, sandboxes, or file permissions lately. The sharper claim is that part of the harness can become an editable natural-language policy object, while exact execution remains in code. ...

Agent Factories: When More AI Means Better Hardware

Button. That was the promise of High-Level Synthesis: write a high-level program, push it through the toolchain, and receive efficient hardware without spending the afternoon whispering to pragmas like a medieval engineer negotiating with silicon spirits. The button never quite arrived. HLS did raise the abstraction level from RTL to C/C++. But performance still depends on expert choices: where to pipeline, where not to pipeline, which arrays to partition, which loops to unroll, which memory access pattern is quietly sabotaging the whole design. The code looks like software; the reasoning remains hardware. ...

EcoThink: When AI Learns to Think Less (and Achieve More)

A chatbot does not need a philosophy seminar to answer “Who directed Oppenheimer?” That sentence sounds obvious. Yet a large part of today’s AI infrastructure behaves as if every user query deserves a carefully staged internal drama: retrieve facts, reason through them, verify the logic, produce a chain of intermediate steps, and finally deliver the answer the system could have produced with a simple lookup. It is impressive in the same way using a crane to move a coffee cup is impressive. Technically capable. Operationally absurd. ...

When Models Disagree With Themselves: Turning Multimodal Conflict into Signal

Screenshots lie differently from HTML. That sounds like a small engineering nuisance until the model is not merely answering a demo question, but reading a supplier invoice, comparing products on a procurement portal, interpreting a dashboard, or deciding which button an autonomous web agent should click next. The same underlying object may appear as a rendered page, raw DOM, OCR text, chart pixels, table JSON, or a caption. Humans usually treat these as different windows onto the same thing. Multimodal models often treat them as different worlds. ...

Autoresearch²: When AI Starts Debugging Its Own Brain

Search is where many AI systems become embarrassingly human. They try one move. It fails. They try a nearby move. It fails. Then, with the serene confidence of a spreadsheet macro wearing a lab coat, they try the first move again. That is the real problem behind many “autonomous research” demonstrations. The issue is not always that the model cannot propose useful ideas. It is that the loop around the model is fixed: propose a change, run an experiment, evaluate the result, keep or discard. Once this loop gets stuck, the system often has no way to ask the more important question: is my search process itself badly designed? ...

Nudge, But Make It Machine: The Rise of Mecha-Nudges

A product listing used to have one obvious job: persuade the buyer. That buyer might be hurried, distracted, status-conscious, price-sensitive, or pretending not to care about shipping fees. Fine. Human messiness was the point. Good copywriting translated product attributes into human salience: scarcity, beauty, quality, emotion, trust. The machine’s role was secondary. Search engines ranked. Recommendation systems sorted. Humans decided. ...

RelayS2S: When AI Stops Waiting Its Turn

A voice assistant has one job before it has any other job: do not make the user wonder whether it heard them. That tiny silence after a user stops speaking is not merely awkward. It is a control signal. It tells the user whether the system is alive, attentive, confused, or quietly regretting its product roadmap. In text chat, a delay can be tolerated because the medium already feels asynchronous. In speech, delay feels personal. The room has a rhythm, and the machine has missed the beat. ...

Shared Memory, Shared Intelligence: When AI Agents Stop Thinking Alone

Memory is supposed to be the practical part of an AI system. A model answers badly, the system records what happened, and next time the agent avoids the same trap. Neat. Sensible. Almost managerial. Then the organization does what organizations always do: it adds more people. In AI terms, that means more agents, more models, more task routes, more specialized components, and more silent assumptions about who should learn from whom. A small model handles routine work. A larger model handles hard reasoning. A coding model writes scripts. A tool-using agent interacts with apps. Suddenly, “memory” is no longer a notebook. It is institutional infrastructure. ...

When Agents Go Off-Script: The Quiet Collapse of Prompted Identity

Roles are convenient. They let managers believe a system is legible before it becomes messy. One agent is the compliance reviewer. Another is the customer-support representative. A third is the skeptical analyst. Add a prompt, assign a tone, define a boundary, and the organization can pretend it has converted social behavior into configuration. ...