AI Agents

From Prototype to Profit: How IBM's CUGA Redefines Enterprise Agents

A recruiter does not wake up excited to reconcile dashboards. The job is already complicated enough: sourcing channels, requisition IDs, candidate funnels, SLA definitions, skill-impact reports, hiring-manager requests, and the occasional spreadsheet that has clearly decided to become a lifestyle. In IBM’s Business Process Outsourcing talent-acquisition workflow, the problem is not that recruiters lack software. It is that they sit between too many systems and must turn fragmented analytics into timely, defensible decisions. ...

The Esperanto of AI Agents: How the Agent Data Protocol Unifies a Fragmented Ecosystem

Every engineering team has met this problem: the useful data exists, but it lives in thirteen different shapes, three different tool conventions, two incompatible logs, and one heroic spreadsheet that nobody dares to open. AI agents have the same disease, only with more acronyms. The paper behind the Agent Data Protocol, or ADP, argues that large-scale supervised fine-tuning of AI agents has been held back less by a lack of data than by a lack of shared representation.1 Agent datasets already exist for coding, software engineering, web browsing, API use, operating-system interaction, and general tool use. The difficulty is that each one tends to encode actions, observations, tool calls, web states, messages, and execution feedback in its own local dialect. Naturally, every dataset is special. How convenient for nobody. ...

Fast but Flawed: What Happens When AI Agents Try to Work Like Humans

Work, in the office sense, rarely begins with a grand theory. It begins with a folder, a spreadsheet, a PDF, a design file, a vague instruction, and someone quietly hoping the task is less annoying than it looks. That is precisely where AI agents are supposed to help. They click, type, read files, write code, search the web, produce documents, and increasingly present themselves as digital workers rather than mere chat boxes with better manners. The tempting story is simple: agents will do the same work humans do, only faster and cheaper. ...

Agents in a Sandbox: Securing the Next Layer of AI Autonomy

TL;DR for operators Tools are where agent security stops being philosophical. Once an AI agent can read files, call APIs, inspect environment variables, launch commands, or connect to a database, the business question is no longer “is the model aligned?” It is “what exactly can this process touch when it is confused, manipulated, or supplied with a malicious tool?” ...

Deep Thinking, Dynamic Acting: How DeepAgent Redefines General Reasoning

Tools are where agent demos go to die. The pitch is usually elegant. Give the model a goal, attach a few APIs, let it reason, and watch the automation glide across systems like a tiny consultant with no calendar conflicts. Then the real world appears: too many tools, unclear documentation, stale context, partial failures, long interaction histories, and the occasional API response that seems to have been designed by someone settling a personal score. ...

Promptfolios: When Buffett Becomes a System Prompt

Investment firms love a house style. Conservative value. Quality growth. Distressed credit. Low-volatility income. The style is supposed to mean something more durable than a portfolio manager’s breakfast mood. The uncomfortable part is that many “styles” still live in a fog of analyst judgement, committee memory, spreadsheet folklore, and the occasional sacred quote from an investor whose annual letters have been read with the reverence normally reserved for scripture. Everyone claims discipline. Fewer can show exactly how that discipline becomes position weights. ...

When More Becomes Smarter: The Unreasonable Effectiveness of Scaling Agents

Desktops are where AI ambition goes to discover gravity. A chatbot can sound competent in one turn. A coding assistant can look brilliant inside a bounded file. But ask an agent to use a real computer for a long task — open the right app, edit the right file, preserve formatting, notice a pop-up, verify the final state, and not confidently click itself into a small administrative tragedy — and the problem changes. Intelligence is no longer a single answer. It is a chain of actions, each one able to quietly poison the next. ...

Backtrack to Breakthrough: Why Great AI Agents Revisit

Search is easy. Knowing when to go back is harder. That is the useful irritation inside GSM-Agent, a new benchmark for studying agentic reasoning under controlled conditions.1 The paper takes grade-school maths problems from GSM8K, removes the premises from the prompt, hides those premises in a searchable document database, and asks an LLM agent to recover the facts before solving the problem. The arithmetic is not supposed to be impressive. That is the point. If a model fails here, we cannot calmly blame differential geometry, PhD-level law, or some mysteriously adversarial enterprise workflow. The agent simply did not find and use the facts. ...

Lost in the Long Game: What UltraHorizon Reveals About Agent Failure at Scale

Budget is the most comforting word in enterprise AI. Give the agent a bigger context window. Give it more tool calls. Give it more time. Give it a notebook, a browser, a Python interpreter, a reminder to “think step by step,” and perhaps a small motivational speech about being thorough. Surely the system will become more reliable. ...

Paths, Not Parrots: When RL Makes LLMs Plan—and When It Doesn’t

A workflow agent usually looks clever right up to the moment one service is down, one permission changes, or one customer case arrives with the wrong sort of mess attached. Then the question becomes painfully simple: did the model learn a plan, or did it learn the usual route? That distinction is the centre of Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective, an ICLR 2026 paper by Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng, Li Dong, Yaru Hao, and Wei Chen.1 The paper is not another victory lap for reinforcement learning. It is more useful than that. It asks what, mechanically, changes when a language model is trained for planning with reinforcement learning rather than supervised fine-tuning. ...