TL;DR for operators
The paper’s practical message is simple enough to be dangerous: once agents start working with other agents, the hard problem stops being “Can this model reason?” and becomes “Can this network behave?”
Quanyan Zhu’s paper on the Internet of Agentic AI, or IoAI, frames the next stage of agentic systems as an open ecosystem of heterogeneous autonomous agents that discover collaborators, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments.1 That sounds grand, which is usually where useful engineering goes to die. But the paper’s better contribution is more sober: it treats agentic AI as a distributed systems problem.
For business readers, this matters because most enterprise agent programs are still organized around isolated assistants, tool wrappers, or orchestration demos. The paper argues that this is too small a unit of analysis. If agents are expected to participate in multi-step workflows across departments, vendors, systems, devices, and compliance boundaries, they need more than clever prompts. They need discovery services, capability metadata, task contracts, identity, authentication, policy enforcement, audit trails, resource management, and failure recovery.
The misconception to kill early is that IoAI is just “multi-agent chat, but bigger.” It is not. A larger group chat of autonomous software is not an operating model. It is a future incident report with better branding.
The paper’s evidence base is mainly architectural synthesis and illustrative case analysis, not a new experimental benchmark. Its manufacturing case study shows decentralized agents forming task coalitions for maintenance, milling, assembly, and quality control. Its distributed operational-coordination case study shows agents sharing intent, constraints, bids, and resource status across space, air, and ground assets in simulation-oriented military scenarios. These examples should be read as mechanism demonstrations, not measured proof of deployment performance.
The operator takeaway: build the network controls before scaling the network. Enterprise agent value will come from governed coordination, not agent proliferation.
The enterprise problem is not agent scarcity. It is coordination debt.
The first wave of agentic AI made a familiar promise: replace passive model inference with autonomous software that can plan, use tools, remember context, call APIs, and complete long-horizon tasks. Useful, certainly. But also familiar. Enterprises have been automating workflows for decades. The agentic twist is that the workflow component now interprets objectives, negotiates context, and revises behavior while the process is running.
That shift creates a new failure mode. The limiting factor is no longer whether one assistant can execute one task. The limiting factor is whether many agents can coordinate without turning the organization into a distributed hallucination factory.
The paper frames this transition as the Internet of Agentic AI. In this vision, agents are not trapped inside one application, vendor platform, or administrative domain. They advertise capabilities, discover collaborators, form temporary coalitions, exchange intermediate artifacts, and adapt to changing conditions. The analogy to the Internet is not decorative. It is structural. The original Internet transformed isolated machines into a shared communication substrate. IoAI, as Zhu describes it, would transform isolated agents into a shared intelligence substrate.
The keyword is substrate. An enterprise does not get IoAI by purchasing more agents. It gets IoAI by building the conditions under which independent agents can interact safely and productively. That means the actual architecture lives in the connective tissue: naming, identity, protocols, authorization, monitoring, resource allocation, and governance.
This is why the paper’s mechanism-first structure matters. A normal summary would list sections: background, architecture, protocols, interoperability, scalability, security, case studies. Fine. Accurate. Also not especially useful. The deeper point is that each layer answers one operational question:
| Layer | Operational question | What breaks when ignored |
|---|---|---|
| Discovery | Which agent can do this job? | Manual routing, brittle integration, wrong collaborator selection |
| Identity and trust | Who is acting, and under what authority? | Impersonation, unauthorized delegation, weak accountability |
| Task semantics | What exactly is being requested and returned? | Ambiguous outputs, unverifiable completion, workflow drift |
| Governance | Is this action allowed in this context? | Unsafe autonomy, compliance failure, uncontrolled tool use |
| Resource orchestration | Where should computation and communication happen? | Latency, cost blowouts, coordination overhead |
| Monitoring and assurance | Is the collective still behaving coherently? | Cascading failure, invisible misalignment, poor incident reconstruction |
This is the paper’s central move: agent intelligence becomes a network property. The enterprise implication is mildly inconvenient. If your agents cannot identify each other, authenticate each other, understand task boundaries, preserve provenance, respect policy, and recover from failure, then the agent count is not a capability metric. It is an exposure metric.
IoAI starts where ordinary multi-agent systems stop
The paper distinguishes three levels of maturity.
The first is the single agent: an autonomous system that can perceive context, reason about a goal, plan actions, invoke tools, interact with environments, and adapt through feedback. This is where most product demos live. One model, wrapped in tools and memory, performing a task.
The second is the bounded multi-agent system. Here, agents specialize. One plans, another retrieves, another verifies, another executes. These systems can be powerful because they divide labor and introduce checks. But they usually operate inside a known environment with known participants and known workflow assumptions.
IoAI is the third level: open, heterogeneous, cross-boundary agent coordination. The participating agents may belong to different organizations, run on different infrastructure, obey different policies, expose different capabilities, and use different communication protocols. That is where ordinary orchestration begins to look underdressed.
The paper’s contribution is to define IoAI not as a single platform but as a coordination architecture spanning multiple administrative and computational domains. Agents may live in cloud data centers, regional cloud platforms, edge servers, on-premise enterprise systems, mobile devices, IoT environments, robotic infrastructure, or federated institutional systems. A discovery and directory layer provides capability indexing, semantic descriptions, availability information, and trust-policy metadata.
That deployment spread is not an implementation footnote. It changes the meaning of agent capability.
A cloud-based planning agent may have access to large foundation models and enterprise data, but suffer latency and data-governance constraints. An edge inspection agent may provide low-latency local perception, but have limited compute. An on-device agent may preserve privacy and operate offline, but face battery, memory, and physical-compromise risks. A federated agent may support cross-institutional workflows while preserving local autonomy, but must survive policy heterogeneity and difficult auditability. An ephemeral serverless agent may scale cheaply for short-lived subtasks, but introduces cold-start, state, and provenance problems.
In other words: “Can the agent do the task?” is the wrong first question. Better questions are:
Can it do the task here?
Can it do the task under this policy?
Can it prove it is allowed to do the task?
Can it return evidence that another agent can verify?
Can it fail in a way the system can understand?
This is less glamorous than a benchmark leaderboard. It is also where the enterprise value is hiding.
The workflow lifecycle is the paper’s real operating model
The paper describes a typical IoAI workflow as four phases: discovery and negotiation, task allocation and delegation, execution and monitoring, and adaptation and composition. That lifecycle is the cleanest way to translate the paper into operational practice.
First, an agent identifies that it cannot complete a goal alone. It searches for collaborators through registries, directories, peer broadcasts, semantic repositories, or other discovery mechanisms. In a mature system, candidate agents advertise capabilities, resources, endpoints, constraints, trust credentials, and operational policies.
Second, agents negotiate responsibilities. This may involve bids, commitments, expected outputs, service guarantees, deadlines, permissions, or dependencies. The paper connects this to classical multi-agent mechanisms such as Contract-Net protocols and auctions, while noting that LLM-enabled agents can also use richer natural-language negotiation.
That last point should make operators nervous in the correct way. Natural language negotiation is flexible. It is also ambiguous, context-sensitive, and occasionally allergic to accountability. A production-grade version needs structured task objects, machine-readable constraints, artifact schemas, cancellation states, status updates, and evidence requirements. Otherwise, delegation becomes a polite way of losing control.
Third, agents execute and monitor. Execution may involve tools, databases, simulations, physical devices, APIs, or additional agents. Monitoring becomes essential because failures are routine in distributed systems. Agents become unavailable. Links fail. Tools return corrupted outputs. Resources saturate. External conditions change. The paper points to heartbeats, distributed logging, checkpointing, anomaly detection, and status reporting as the mechanisms that let a workflow distinguish normal variation from actual failure.
Fourth, the workflow adapts. Agents may recruit new participants, revise deadlines, change task assignments, or restructure the workflow itself. This is the feature that makes agentic workflows different from traditional automation pipelines. It is also the feature that makes governance non-negotiable.
A static workflow can be audited once and operated many times. An adaptive workflow keeps changing itself while running. Wonderful, until it adapts into something your compliance team has never approved.
The paper’s mechanism is therefore not “agents collaborate.” It is this:
goal exceeds one agent
↓
discover suitable collaborators
↓
negotiate responsibilities and constraints
↓
delegate tasks with permissions and evidence requirements
↓
execute with monitoring and status signals
↓
adapt the coalition under policy and resource limits
↓
produce system-level behavior that no single agent owned
That final line is the prize and the risk. No single agent owns the emergent behavior. So the system must.
Discovery is not search. It is capability routing under constraints.
One of the paper’s more useful clarifications is that agent discovery is not merely address resolution. In the Internet, DNS maps names to locations. In IoAI, a discovery layer must help answer a richer question: which agent is capable, available, authorized, trustworthy, and operationally appropriate for this task?
That is a different kind of registry.
A simple service catalog might say: here is an agent endpoint. A serious IoAI directory must say something closer to:
- what the agent can do;
- what tools and data it can access;
- which protocols it supports;
- who owns or operates it;
- what credentials it carries;
- what policies govern its behavior;
- what evidence it returns;
- what its availability and resource state look like;
- how it can be revoked or migrated.
The paper discusses emerging concepts such as Agent Cards in Agent-to-Agent frameworks and Agent Naming Services that map agent identifiers not only to endpoints but also to service metadata, cryptographic credentials, capability descriptions, and governance information. This matters because open agent ecosystems will contain short-lived, replicated, migrated, and vendor-specific agents. Without a robust naming and discovery layer, collaboration becomes manual integration with extra steps. Naturally, the industry may call that “agentic transformation” for two quarters before rediscovering service registries.
For enterprise adoption, discovery should be treated as routing logic, not a directory feature. The routing question is: given a task, risk level, policy context, data sensitivity, latency constraint, and evidence requirement, which agent or coalition should receive the work?
That implies an operational design pattern:
| Discovery field | Business meaning |
|---|---|
| Capability description | Prevents routing sensitive work to charming incompetence |
| Trust credential | Establishes whether the agent is allowed to participate |
| Supported protocol | Determines whether collaboration is technically possible |
| Data-access boundary | Prevents accidental policy violations |
| Availability and latency | Determines whether the agent is usable now |
| Provenance and owner | Supports audit, liability, and incident response |
| Revocation status | Prevents stale or compromised agents from remaining active |
This is also where semantic interoperability becomes a practical headache. Pure natural language descriptions are flexible but ambiguous. Fully formal ontologies are precise but brittle. The paper’s implied compromise is sensible: structured schemas for safety-critical fields, and flexible descriptions where ambiguity is tolerable.
That is not a philosophical midpoint. It is an engineering budget decision.
Identity is the first control plane
The paper gives identity more weight than many agent discussions do, and correctly so. In a network of autonomous agents, “who is acting?” becomes the first control-plane question.
Existing identity systems such as OAuth, OpenID Connect, TLS, X.509 certificates, and enterprise access management can support some deployments. But the paper argues that autonomous agents differ from conventional users and services. They can be created dynamically, operate temporarily, migrate across platforms, delegate responsibilities, and interact across organizational boundaries.
That breaks assumptions baked into human-centric identity systems. A person has a relatively stable identity. A software service usually has a stable operational boundary. An agent may be ephemeral, delegated, replicated, or embedded in a chain of other agents. When something goes wrong, an enterprise needs to know not only which user initiated a request, but which agent acted, under which credential, on whose behalf, using which tool, based on which context, with which downstream delegation.
That is why the paper discusses decentralized identifiers, verifiable credentials, public-key infrastructure, mutual TLS, certificate-based authentication, and hardware-assisted attestation. The point is not that every enterprise must immediately deploy every one of these mechanisms. The point is that agent identity must become richer than an API key.
A serious identity layer should support at least five claims:
| Claim | Example question |
|---|---|
| Entity identity | Which agent is this? |
| Authority | What is it allowed to do? |
| Provenance | Who created, owns, or certified it? |
| Execution integrity | Is it running approved code in an approved environment? |
| Delegation context | On whose behalf is it acting, and with what limits? |
This is where the paper’s security analysis becomes business-relevant. A compromised or impersonated agent does not merely leak data. It can manipulate workflows, influence other agents, redirect tasks, consume resources, or trigger cascading failures. In a multi-agent system, identity failure is not a login problem. It is a coordination attack.
The paper’s threat taxonomy makes this explicit. It includes identity threats such as Sybil attacks, impersonation, credential theft, and rogue agents; communication threats such as man-in-the-middle attacks, eavesdropping, tampering, and replay; workflow threats such as prompt injection, tool compromise, workflow poisoning, and cascading failures; economic threats such as incentive manipulation, collusion, resource exploitation, and reputation gaming; and availability threats such as denial-of-service, supply-chain attacks, data poisoning, and infrastructure compromise.
The important interpretation is that these are not isolated security categories. They compose. A fake identity can enter a registry, win a task, poison a workflow, trigger tool calls, generate misleading evidence, and damage the reputation layer. The network turns local compromise into systemic ambiguity.
This is why agent identity should be designed before agent scale. Retrofitting accountability after deployment is a bold strategy, in the same way that installing brakes after entering traffic is bold.
Protocols move messages. Interoperability moves responsibility.
The paper makes a useful distinction between communication and interoperability. Communication means agents can exchange messages. Interoperability means agents can collaborate responsibly.
A transport layer can move data using HTTP, REST, gRPC, WebRTC, publish-subscribe systems, MQTT, Kafka, NATS, IPFS, or peer-to-peer overlays. Those mechanisms matter, especially for latency-sensitive, bandwidth-constrained, or resilient deployments. But they are not enough.
If one agent asks another to “analyze customer risk,” the message format is the easy part. The hard part is the task semantics:
What does “risk” mean?
Which data may be used?
Which regulations apply?
What evidence must be returned?
Can the task be delegated further?
What confidence level is required?
What happens if the agent cannot complete the work?
Who is accountable for the output?
The paper’s layered interoperability table is one of its more practical contributions. It separates connectivity, identity and trust, capability discovery, task semantics, governance, and economic incentives. Each layer has a distinct failure mode. Connectivity failure isolates platforms. Identity failure enables impersonation and unauthorized delegation. Capability failure produces poor routing. Task-semantic failure produces ambiguous outputs. Governance failure produces unsafe tool use. Incentive failure produces unreliable cooperation.
That layered view prevents a common mistake: assuming that one protocol will solve agent interoperability. It will not. Model Context Protocol, Agent-to-Agent Protocol, Agent Network Protocol, Agent Communication Protocol, and meta-protocol approaches such as Agora are all early attempts to standardize pieces of the stack. But no protocol can simultaneously solve transport, identity, policy, task semantics, economic incentives, liability, and assurance.
For business leaders, the practical lesson is to stop asking whether “our agents support the standard” and start asking which layer the standard actually standardizes.
| Standardization target | Useful for | Not sufficient for |
|---|---|---|
| Tool/context exchange | Connecting models to tools and resources | Cross-organization trust |
| Agent task objects | Delegation and workflow status | Legal accountability |
| Capability descriptions | Matching agents to tasks | Verifying actual competence |
| Identity credentials | Authentication and authorization | Semantic task correctness |
| Meta-protocol negotiation | Versioning and heterogeneous ecosystems | Safety governance by itself |
The paper’s stronger claim is that interoperability must be policy-aware by construction. An agent should not accept a task merely because it understands the task and can technically execute it. It must determine whether the request is authorized, safe, consistent with policy, and aligned with the broader workflow.
That one sentence is the difference between an enterprise system and a demo swarm.
Resource management decides whether collaboration is worth the overhead
The paper’s scalability section is especially useful because it refuses the lazy assumption that more agents automatically means more capability.
Agentic workflows consume compute, memory, bandwidth, energy, tool capacity, and orchestration attention. They involve model calls, retrieval, tool execution, API calls, communication rounds, state synchronization, and monitoring. As agent count increases, task decomposition may improve, but coordination overhead also rises.
That trade-off is the quiet killer of many agent architectures. A workflow can become “multi-agent” in the same way a meeting can become “cross-functional”: more participants, more updates, less progress.
The paper points to heterogeneous computing as a key response. Transformer inference may belong on accelerators. Retrieval may be memory- or storage-bound. Workflow coordination may rely more on CPUs and networking. Edge sensing may need local processing because latency is intolerable. Cloud agents may handle long-horizon planning. Serverless agents may absorb bursty subtasks. Federated agents may preserve local policy control across organizations.
The core mechanism is workload-aware placement.
The paper also cites recent platform evidence suggesting that parallel agent execution can improve throughput: a four-agent workload on NVIDIA DGX Spark reportedly required about 2.6 times the execution time of a single agent, rather than the fourfold time expected under purely sequential execution. This should be read carefully. It supports the plausibility of parallel execution benefits under suitable infrastructure, not a general law that four agents deliver near-linear enterprise productivity. Alas.
The boundary is important. Parallelism helps when subtasks have limited interdependencies. It hurts when agents need frequent synchronization, shared context, or sequential decision chains. The paper therefore argues for resource-aware orchestration: decide when to centralize reasoning, when to distribute execution, when to summarize or cache context, and when to limit communication so coordination overhead does not erase the benefit of collaboration.
A practical operator test follows:
| Workflow property | Likely architecture |
|---|---|
| High latency sensitivity | Edge or on-device agents with local control |
| Heavy long-horizon reasoning | Cloud planning agents |
| Sensitive institutional data | Federated agents with local policy enforcement |
| Bursty task demand | Ephemeral or serverless agents |
| Many independent subtasks | Parallel multi-agent execution |
| Many dependent subtasks | Centralized or hybrid orchestration |
| High audit and compliance burden | Brokered or hybrid architecture with strong logging |
| Contested or unreliable connectivity | Decentralized or edge-resilient coordination |
The business interpretation is straightforward: agent architecture is cost architecture. Every additional agent adds communication, monitoring, identity, orchestration, and failure-management cost. The ROI case for multi-agent systems depends on whether specialization and parallelism exceed that coordination burden.
That is not pessimism. It is arithmetic wearing a hard hat.
The manufacturing case study shows controlled emergence, not magic autonomy
The paper’s manufacturing case study is the most business-legible part of the source. It models a factory as a network of LLM planner agents, vision inspection agents, robot execution agents, human operator agents, and autonomous mobile robot or automated guided vehicle agents. These agents form temporary coalitions around production tasks: maintenance, milling, assembly, and quality control.
The central mechanism is an exit-and-join process. Agents evaluate whether to remain in a coalition, leave it, or join another. The local decision rule weighs expected time, energy, task priority, team compatibility, capability match, and network reliability. A robot may leave a less suitable coalition and request admission to a milling team, but only if capacity, skill, resources, safety policy, and communication feasibility constraints are satisfied.
That acceptance constraint is the key. Without it, the case study would be another autonomy fairy tale. With it, the example becomes a governance mechanism.
The paper walks through three manufacturing scenarios. In a production surge, agents migrate toward an overloaded milling coalition. Transport and inspection agents redirect support, while underutilized agents leave less critical work. The result is adaptive workload balancing without a centralized scheduler recomputing the entire factory plan.
In equipment degradation, a machine signals impending maintenance. Maintenance agents coordinate with human operators and planner agents, while assembly and logistics agents re-evaluate task sequencing. The point is not simply predictive maintenance. It is distributed coalition repair before local degradation becomes systemic downtime.
In quality control, a defect-rate increase raises the utility of participating in a quality coalition. Vision agents, transport agents, and human operators shift toward inspection and rework management. Quality resources are allocated where they are needed without a factory-wide rescheduling cycle.
The likely purpose of this case study is exploratory illustration. It does not prove that an IoAI factory will improve throughput by a measured percentage. It shows how the paper’s mechanisms compose: specialized agents, local observations, task coalitions, exit-and-join dynamics, acceptance constraints, and emergent outcomes.
For an operator, the case study translates into a practical architecture:
local signal
→ agent evaluates task utility
→ agent requests coalition change
→ constraint filter checks feasibility and policy
→ coalition accepts or rejects
→ workflow reallocates resources
→ monitoring observes system-level effect
That structure is useful because it separates autonomy from admission control. Agents can propose adaptation. The system decides whether adaptation is allowed.
The distinction matters. Enterprise agent programs should not ask, “Can the agent reconfigure the workflow?” They should ask, “Under what constraints may an agent propose reconfiguration, and what evidence must be checked before the change takes effect?”
That is how controlled emergence becomes an operating principle rather than a slogan someone puts on a conference slide next to a glowing network diagram.
The defense case study is about simulation and stress, not deployment enthusiasm
The paper’s second case study applies IoAI to distributed operational coordination in a Mosaic Warfare-style setting. It describes space, air, and ground assets modeled as autonomous agents: satellites, helicopters, unmanned aircraft, ground vehicles, artillery nodes, command nodes, and headquarters elements. These agents share intent, predict constraints, negotiate task assignments, and adapt execution through decentralized coordination.
This is a sensitive domain, and the paper is careful about the intended scope. It states that the goal is not deployment of autonomous operational systems in real-world environments, but development and evaluation of distributed agentic architectures in high-fidelity simulation environments representative of Department of Defense scenarios.
That boundary is not cosmetic. It changes how the case should be read.
The case study’s likely purpose is to explore resilience and coordination under contested conditions. It illustrates a recurring loop: sense, predict, broadcast, negotiate, act. Agents evaluate local state such as fuel, bandwidth, sensor availability, assignments, and environmental constraints. They broadcast intent signals describing objectives, capabilities, constraints, priorities, and expected utility. Other agents use those signals to anticipate changes and reorganize mission execution without requiring complete global state.
In one scenario, a ground node requests intelligence, surveillance, and reconnaissance coverage for a sector. Nearby airborne, relay, and space-based agents evaluate their ability to support the mission based on position, fuel, bandwidth, revisit windows, and cost. A coalition emerges through local exchanges of bids and resource information.
In another scenario, a low-fuel handoff agent broadcasts an impending constraint. Other agents can reorganize before failure occurs. The purpose is to show resilience under disruption: the network adapts before a local constraint becomes mission collapse.
For enterprise readers outside defense, the transferable mechanism is not the mission context. It is intent-centric communication under incomplete information.
Most enterprise workflows still communicate status after the fact: done, failed, blocked, waiting. IoAI-style systems need richer intent signals: what an agent plans to do, why, with what constraints, at what expected utility, and under what policy boundary. That lets other agents adapt before hard failure occurs.
A supply-chain analogue is obvious. A logistics agent nearing capacity should not merely fail a shipment request. It should broadcast constraint forecasts, alternate route options, confidence, and cost. Procurement, customer-service, and warehouse agents can then renegotiate responsibilities before the operation breaks. Revolutionary? No. Useful? Annoyingly, yes.
The paper’s figures and tables are framework evidence, not benchmark proof
Because this paper is a systems framework, its figures and tables function differently from experimental plots in a machine-learning paper. They should not be read as quantitative results. They are conceptual scaffolding.
| Source object | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Figure 1: single-agent architecture | Background mechanism | Agentic AI involves memory, tools, environment interaction, and feedback loops | That any specific agent is reliable in production |
| Figure 2: IoAI vision | Conceptual extension | Isolated agents can be reframed as networked participants | That global agent ecosystems already exist |
| Figure 3: distributed deployment | Architecture sketch | Agents span cloud, edge, device, enterprise, and regional platforms with discovery services | That a mature universal discovery layer is available |
| Figure 4: collaborative drug discovery | Illustrative workflow | Cross-organizational discovery, coalition formation, execution, validation, and feedback | That drug discovery performance improves quantitatively |
| Table 1: IoAI benefits | Taxonomy | Core value categories: collective intelligence, mission execution, scalability, adaptability, resilience, controlled emergence | Magnitude of those benefits |
| Table 2: deployment models | Design trade-off map | Cloud, edge, device, federated, elastic, and economic agents have distinct constraints | Which architecture is optimal for a given enterprise |
| Table 3: interoperability layers | Mechanism framework | Interoperability requires connectivity, identity, capability, task semantics, governance, and incentives | That current standards cover all layers |
| Table 4: threat taxonomy | Risk framework | Agent ecosystems expand security risk across identity, communication, workflow, incentives, availability, and supply chain | Probability or severity of each threat in a given deployment |
| Figure 5: manufacturing case | Exploratory application | Coalition formation and acceptance constraints can translate IoAI into industrial coordination | Empirical factory performance gains |
| Figure 6: distributed operational coordination | Exploratory stress scenario | Intent sharing and local adaptation support resilience under contested conditions | Real-world autonomous deployment readiness |
This distinction is important because the paper’s value is not in claiming that IoAI currently outperforms centralized systems by a measured margin. Its value is in defining the architecture that would make such comparisons meaningful.
A benchmark can tell you whether a model completed a task. It cannot tell you whether an agent ecosystem has trustworthy identity, semantic interoperability, safe delegation, auditable provenance, efficient resource placement, incentive-compatible participation, and bounded emergence. Those are system properties. They need system tests.
The paper itself points toward future evaluation metrics: workflow completion time, coordination latency, communication efficiency, resource utilization, energy consumption, scalability under workload variation, resilience to node failure, robustness against compromised agents, and maintenance of alignment with human intent. That is the measurement agenda. Not another single-number “agent score.” We have suffered enough of those.
What Cognaptus would infer for enterprise architecture
The paper directly shows a conceptual architecture and mechanism taxonomy. Cognaptus can infer a practical enterprise roadmap, but the inference should stay disciplined.
The direct claim: future agent capability will depend not only on individual model intelligence but also on architectures that let agents discover one another, exchange information, negotiate responsibilities, and execute workflows collectively.
The business inference: enterprises should treat agent deployment as a network design problem. That means the minimum viable architecture for serious multi-agent operations includes a registry, an identity layer, a task-contract layer, a policy layer, a communication layer, observability, resource orchestration, and human escalation.
A useful enterprise stack would look like this:
| Enterprise layer | Practical implementation question |
|---|---|
| Agent registry | Where are approved agents listed, and what metadata do they publish? |
| Capability schema | How are skills, tools, limits, cost, data access, and evidence obligations described? |
| Identity and credentials | How does each agent prove identity, authority, provenance, and delegation context? |
| Task contract | What structured object defines request, output, deadline, permission, evidence, and cancellation? |
| Policy enforcement | Which actions require approval, denial, sandboxing, or escalation? |
| Communication substrate | Which interactions are request-response, event-driven, streaming, or peer-to-peer? |
| Monitoring | How are status, health, tool calls, intermediate artifacts, and failures logged? |
| Resource scheduler | How are model calls, retrieval, tools, bandwidth, and compute placed across cloud, edge, and device? |
| Trust and revocation | How are compromised, stale, or underperforming agents removed from circulation? |
| Incident reconstruction | Can the enterprise replay who delegated what to whom, with which evidence and policy state? |
This is not bureaucracy for its own sake. It is how agentic workflows become governable.
The first deployment target should not be the most autonomous workflow. It should be the workflow with clear task boundaries, measurable outputs, low ambiguity, explicit policy constraints, and high coordination cost under the current process. Good candidates include cross-functional ticket triage, regulated document review, supply-chain exception handling, quality-control routing, infrastructure incident response, and internal research workflows. Bad candidates include anything where no one can define ownership, evidence, success criteria, or escalation paths. Which, regrettably, includes a surprising number of “strategic AI transformation” initiatives.
The ROI pathway is also more subtle than labor replacement. IoAI value appears in:
- lower coordination latency across functions;
- faster recovery from local failures;
- better routing of specialized work;
- reduced dependence on a central orchestrator;
- improved resilience under demand spikes;
- stronger auditability of delegated actions;
- more efficient placement of compute and communication;
- safer adaptation because local autonomy is bounded by policy.
The most important word there is bounded. Unbounded agent autonomy is not innovation. It is unmanaged delegation.
Where the paper is deliberately unfinished
The paper is broad by design, and that breadth creates boundaries.
First, it is not an empirical benchmark paper. The manufacturing and distributed-coordination cases are illustrative. They show how mechanisms might fit together, not measured deployment outcomes. The cited parallel-execution result is supporting evidence from related infrastructure work, not the central experiment of this paper.
Second, the standards landscape is early. MCP, A2A, ANP, ACP, Agora-style meta-protocols, decentralized identifiers, verifiable credentials, and agent naming concepts are promising ingredients. They are not yet a settled Internet stack for agents. Enterprises should expect versioning, vendor extensions, adapters, incomplete semantics, and awkward governance gaps.
Third, formal guarantees remain underdeveloped. The paper repeatedly points to the need for rigorous theories of agent communication, semantic coordination, information exchange, collective intelligence, emergent behavior, communication complexity, and alignment. Today’s agent frameworks are engineering systems. They are not yet grounded in the equivalent of information theory for autonomous communication.
Fourth, collective alignment is harder than individual alignment. A single model may be aligned with an instruction. A network of agents may still produce behavior that diverges from organizational objectives because of delegation chains, local incentives, conflicting constraints, hidden dependencies, or feedback loops. The paper correctly treats this as a separate research frontier.
Fifth, cyber-physical deployment raises the stakes. In software-only workflows, bad outputs may be contained by review and rollback. In manufacturing, logistics, infrastructure, healthcare, and defense simulation environments, agent decisions may affect physical systems, safety constraints, latency budgets, and human operators. Assurance requirements rise accordingly.
Finally, governance and liability remain unresolved. If one agent delegates to another, which invokes a tool, which relies on a stale credential, which acts on poisoned context, which causes a compliance failure, who is responsible? The architecture can support provenance and auditability. It does not automatically settle accountability.
That is the correct boundary for the paper: it gives operators a map of the system that must exist before IoAI becomes reliable. It does not claim that the roads are paved.
The real shift is from assistant design to institution design
The strongest reading of this paper is that agentic AI is leaving the assistant era and entering the institution era.
An assistant helps a user. A multi-agent workflow helps a process. An IoAI ecosystem coordinates institutions, infrastructure, tools, data, policies, and autonomous actors. At that scale, the design problem changes. The important artifacts are not only prompts and tools. They are contracts, credentials, directories, policies, protocols, logs, incentives, and revocation mechanisms.
That is why the Internet analogy works. The Internet did not scale because every endpoint became brilliant. It scaled because shared protocols, naming systems, routing mechanisms, security layers, and governance institutions made heterogeneous participation possible. IoAI will need the same kind of boring machinery, updated for agents that reason, negotiate, delegate, and act.
The paper’s quiet warning is that individual model capability will not save a badly governed agent network. A clever agent inside a fragile coordination system is still fragile. Possibly faster at becoming so.
For enterprise operators, the next question is not “How many agents can we deploy?” It is:
Can our agents find the right collaborator?
Can they prove authority?
Can they exchange task meaning, not just messages?
Can they adapt under constraints?
Can we observe the workflow while it changes?
Can we revoke, audit, and recover?
Can we prevent local autonomy from becoming system-level nonsense?
That is the business relevance of IoAI. Not a swarm of clever assistants. A governed network of accountable computational actors.
The future of agentic AI may indeed look like an Internet. But before celebrating the bandwidth, someone should probably install the traffic laws.
Cognaptus: Automate the Present, Incubate the Future.
-
Quanyan Zhu, “The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale,” arXiv:2606.12835v1, 11 June 2026, https://arxiv.org/abs/2606.12835. ↩︎