Seeing the Agents: Why Explaining AI Systems Is Harder Than Explaining AI Models

A dashboard says the customer-service agent resolved the ticket. The log says it retrieved the policy document, summarized the complaint, checked the refund rule, and sent a polite reply. The manager sees the outcome and asks the obvious question: why did the system approve the refund?

For a normal machine-learning model, this question has a familiar shape. Which features mattered? Which tokens were important? Which image region pushed the classifier toward one label? We have a whole shelf of explainability tools for that shelf-sized problem.

Agentic systems are not shelf-sized.

They plan, delegate, call tools, write memory, retrieve earlier context, coordinate sub-agents, and sometimes fail three steps before the failure becomes visible. A saliency map may tell you what a perception module looked at. It will not tell you why the planner delegated the wrong subtask, why the executor trusted stale memory, or why two agents politely coordinated themselves into nonsense. Very efficient nonsense, naturally.

The paper “Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability” makes a useful distinction for businesses now building AI workflows: explaining an AI model is not the same thing as explaining an AI system.¹ The first is mostly about model behavior. The second is about accountability across a chain of autonomous decisions.

That difference sounds abstract until something goes wrong. Then it becomes the entire audit.

The object being explained has changed

Traditional explainability grew up around a simple operational picture:

input → model → output

That picture is no longer wrong. It is just too small.

The paper distinguishes among three levels: large language models, AI agents, and agentic systems. A language model generates or reasons over text. An AI agent wraps that model inside a software architecture with perception, planning, memory, tool use, and action. An agentic system then coordinates multiple agents toward a larger goal, often with shared memory and orchestration.

The important word is not “agent.” It is “system.”

Layer	What it mainly does	What explainability usually asks	Why that question becomes insufficient
LLM	Produces text or reasoning from context	Why did this model produce this answer?	The model is only one component in the workflow
AI agent	Plans and acts through tools and memory	Why did this agent choose this action?	The action may depend on prior tool results, memory, or retries
Agentic system	Coordinates multiple agents across a workflow	Why did the system produce this chain of actions?	Causality is distributed across agents, time, tools, and state

This is the paper’s first contribution: it reframes interpretability as system-level accountability. The target is no longer just a prediction. The target is a trajectory.

That shift matters because the business risk has also changed. A classifier might wrongly reject an insurance claim. An agentic claims system might retrieve the wrong policy clause, delegate review to the wrong sub-agent, update customer state incorrectly, trigger an automated notification, and then provide a confident explanation of the whole chain after the fact. The final answer is only the last symptom.

Judy Zhu, Dhari Gandhi, Himanshu Joshi, Ahmad Rezaie Mianroodi, Sedef Akinli Kocak, and Dhanesh Ramachandran, “Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability,” arXiv:2601.17168, 2026, https://arxiv.org/html/2601.17168. ↩︎

The object being explained has changed#

The object being explained has changed