Agentic Systems Need Architecture, Not Vibes

Agentic AI has a habit of sounding more engineered than it is.

A demo connects an LLM to a search tool, adds a memory store, wraps the whole thing in a planner, and suddenly the slide deck says “autonomous agent.” The system may still forget what it just saw, retrieve the wrong context, misuse tools, loop on bad actions, or politely hallucinate its way into a support ticket. But the diagram has arrows, so morale remains high.

The paper Agentic Design Patterns: A System-Theoretic Framework is useful because it pushes against this convenient optimism.¹ Its argument is not that agents need more patterns in the usual shopping-list sense: reflection, tool use, planning, multi-agent collaboration, retrieval, memory, guardrails, and whatever else happens to fit into a conference slide. The stronger claim is that agentic systems need an architecture that explains where these patterns belong and what failure mode each pattern is meant to repair.

That sounds less glamorous than “agentic workflow.” Good. Glamour is not a reliability primitive.

The paper’s real move is from agent features to agent anatomy

A common way to discuss AI agents is to list capabilities. An agent can reason, retrieve, call tools, remember, plan, reflect, collaborate, and learn. This is not wrong, but it is structurally weak. Capabilities do not tell the engineer where state lives, how perception is validated, which component is responsible for feedback, or how a failed tool call should change the next plan.

The paper begins from the opposite direction. It asks: what functional subsystems must exist for an agentic AI system to behave reliably?

Its answer is a five-part architecture:

Subsystem	Role in the agent	Failure class it helps diagnose
Reasoning & World Model	Maintains the agent’s internal representation of the task and environment, then directs decisions	Inconsistent world models, poor planning, weak counterfactual reasoning
Perception & Grounding	Turns raw input into structured percepts the agent can reason over	Bad grounding, unvalidated observations, noisy or misleading inputs
Action Execution	Converts plans into actions and gathers feedback	Tool misuse, brittle execution, weak recovery from errors
Learning & Adaptation	Observes outcomes and updates future strategy or knowledge	No long-term improvement, repeated mistakes, poor adaptation
Inter-Agent Communication	Enables structured interaction with other agents	Coordination breakdown, ambiguous messages, weak joint planning

This is the mechanism-first core of the paper. The agent is not treated as a big model with accessories attached. It is treated as a system with a cognitive core, operational interfaces, an adaptive shell, and an optional social interface.

That distinction matters. A large language model can generate a plan. It does not automatically create a durable world model. A vector database can store documents. It does not automatically decide what context is relevant now. A tool wrapper can call an API. It does not automatically know whether the action failed, partially succeeded, or corrupted the agent’s state. These are subsystem responsibilities, not vibes.

The cognitive cycle is the useful diagram hiding behind the pattern catalogue

The paper’s architecture becomes clearer when read as a loop.

First, Perception & Grounding processes raw inputs into structured percepts. Those percepts enter the Reasoning & World Model subsystem, where the agent updates its internal understanding of the situation. The reasoning core then creates either an action plan for Action Execution or a communication request for Inter-Agent Communication. The result of action or communication produces feedback. Learning & Adaptation interprets that feedback and sends strategy or knowledge updates back into the reasoning core.

In simplified form:

Raw input
  -> Perception & Grounding
  -> Reasoning & World Model
  -> Action Execution / Inter-Agent Communication
  -> Feedback
  -> Learning & Adaptation
  -> Updated world model and strategy

This loop is the paper’s most practical idea. The 12 patterns are easier to understand after this loop is in place. Without the loop, “Retriever,” “Reflector,” “Controller,” and “Tool Use” sound like product features. With the loop, they become repairs to specific information-flow problems.

A Retriever is not “RAG because RAG is popular.” It is a pattern for giving the reasoning subsystem a context-aware interface to memory. A Recorder is not “logging because logs are nice.” It externalises the state of the world model so the agent can recover, inspect, or resume. A Reflector is not “ask the model to think about its mistake.” It is an adaptation mechanism that turns outcomes into causal insight. The difference is subtle on a slide and enormous in production.

The paper is not claiming these ideas are new in isolation. It explicitly notes that reflection, tool use, and skill acquisition have existed across AI subfields. Its contribution is systematisation: placing these ideas inside a coherent architecture and mapping them to recurring classes of failure.

That is a more modest claim than “we invented agent reliability.” It is also more useful.

The 12 patterns are not a checklist; they are failure-specific repairs

The paper groups its 12 Agentic Design Patterns into four categories. Read lazily, this becomes another taxonomy. Read mechanically, it becomes a diagnostic tool.

Pattern group	Patterns	What they repair
Foundational patterns	Integrator, Retriever, Recorder	Input validation, context retrieval, state preservation
Cognitive & decisional patterns	Selector, Planner, Deliberator	Goal prioritisation, decomposition, adaptive action choice
Execution & interaction patterns	Executor, Tool Use, Coordinator	Reliable execution, tool interfaces, multi-agent communication
Adaptive & learning patterns	Reflector, Skill Build, Controller	Causal learning, reusable skills, alignment monitoring

The important word is “repair.” Each pattern corresponds to a structural weakness.

If an agent keeps mixing old assumptions with new facts, adding another planner may not help. The issue may be Perception & Grounding or world-model integration. The relevant pattern may be Integrator, not Planner.

If an agent repeatedly calls tools incorrectly, a longer prompt may not fix the problem. The issue may sit in Action Execution. The relevant patterns may be Tool Use and Executor: standardised tool invocation, systematic feedback collection, and recovery behaviour.

If an agent makes the same mistake across sessions, “reflection” inside a single prompt is too small. The issue is Learning & Adaptation. The relevant patterns may be Reflector, Skill Build, and Recorder.

This is the paper’s quiet but useful correction to the usual agent-building instinct. When something fails, do not immediately add another capability. Locate the broken subsystem first. Then choose a pattern that addresses the interaction problem.

The business version is even simpler: stop buying agent features as if they were Lego bricks. Diagnose the failure path.

What the paper actually shows, and what it does not

The evidence in this paper is mostly conceptual and qualitative. That is not a weakness if read correctly. It becomes a weakness only if someone tries to sell it as benchmarked performance evidence. Let us not do that. The world already has enough benchmark theatre.

The paper contains several types of support:

Paper element	Likely purpose	What it supports	What it does not prove
Literature comparison table	Comparison with prior work	The authors position their framework as more system-theoretic and GoF-pattern-aligned than selected earlier taxonomies	It does not prove superiority in implementation outcomes
Five failure classes	Problem map	The framework is motivated by recurring agent reliability issues: world modelling, cognition, execution, learning, collaboration	It does not quantify which failures dominate in real deployments
System architecture figure	Main conceptual contribution	Agents can be decomposed into interacting subsystems with distinct responsibilities	It does not validate that this decomposition is optimal
Cognitive-cycle figure	Mechanism explanation	Reliability depends on information flow across perception, reasoning, execution, feedback, and adaptation	It does not measure runtime efficiency or latency
Sankey diagram linking failures, subsystems, and patterns	Analytical bridge	Patterns are mapped to failure classes and architectural components	Flow widths are qualitative, not empirical weights
Pattern catalogue	Main design artefact	The paper provides reusable pattern names, intents, and associated problem classes	It does not provide detailed implementation templates for every production context
ReAct analysis	Qualitative demonstration	The framework can diagnose and redesign an existing agent loop	It does not show improved benchmark scores after redesign

This matters because the paper’s value is not “we achieved X% higher task success.” There is no such result here. Its value is closer to architecture review: it gives engineering teams a vocabulary for discussing why an agent fails and what structural intervention might help.

That is still valuable. In many business deployments, the expensive part is not discovering that the agent failed. Users will tell you, usually with admirable emotional range. The expensive part is diagnosing whether the failure came from perception, retrieval, reasoning, tool execution, state management, adaptation, or coordination. A pattern catalogue anchored to subsystems can reduce that ambiguity.

ReAct is the case study because it exposes the monolith problem

The paper’s application section analyses ReAct, the familiar loop where an LLM alternates between reasoning traces and actions. The authors use a three-step method: deconstruct, diagnose, prescribe.

In their deconstruction, ReAct’s Reasoning & World Model corresponds to the LLM’s “Thought” generation. But that world model is implicit and transient, held inside the context window. Perception & Grounding is rudimentary: the agent receives unstructured observations from the environment. Action Execution is the “Act” step. Learning & Adaptation is absent. Inter-Agent Communication is also absent because ReAct is framed as a single-agent system.

The diagnosis follows naturally. ReAct is powerful because it interleaves reasoning and action. But architecturally, much of the system remains monolithic. The central LLM absorbs too many responsibilities: interpreting observations, holding state, planning, selecting actions, recovering from errors, and implicitly judging what the latest feedback means. That may work in a clean demo. It becomes fragile when observations are noisy, tools fail, tasks extend across sessions, or the agent must improve over time.

The prescription is to insert patterns around the ReAct loop:

Integrator validates observations before they enter the reasoning core.
Retriever and Recorder improve context retrieval and state management.
Executor and Tool Use make action execution more reliable and standardised.
Reflector processes feedback so failures can inform future strategy.
In cases of critical inconsistency, the system can save the problematic state and initiate a learning cycle rather than simply continuing the loop with corrupted context.

This is not a claim that ReAct is obsolete. It is a claim that ReAct, by itself, is not an architecture for durable agent behaviour. It is a useful interaction pattern, not a complete operating model. Calling it an “agent architecture” without additional subsystems is like calling a steering wheel a car. It is an important part. It is not the thing.

The practical business value is cheaper diagnosis, not magic autonomy

For companies building AI agents, the paper is most useful as a design-review framework.

The practical question is not “Should we use all 12 patterns?” That would be taxonomy cosplay. The better question is: where does the agent fail, and which subsystem owns that failure?

A business-facing diagnostic might look like this:

Observed business failure	Likely subsystem issue	Candidate pattern response
Agent answers confidently using stale or irrelevant context	Reasoning & World Model / retrieval interface	Retriever, Recorder
Agent accepts misleading user input or unvalidated external observations	Perception & Grounding	Integrator
Agent calls the wrong tool or passes malformed parameters	Action Execution	Tool Use, Executor
Agent repeats the same failed behaviour across sessions	Learning & Adaptation	Reflector, Skill Build
Agent follows a plan after the goal has changed	Reasoning & World Model	Selector, Deliberator
Multi-agent workflow produces duplicated or contradictory work	Inter-Agent Communication	Coordinator
Agent behaves within task instructions but outside business policy	Learning & Adaptation / governance	Controller

This is where the paper becomes operational. A support-ticket triage agent that retrieves the wrong policy document does not primarily need “more autonomy.” It needs better retrieval interfaces, state discipline, and grounding checks. A finance workflow agent that executes actions without auditable feedback does not need a more dramatic persona. It needs an Executor pattern and a Tool Use pattern. A research agent that keeps repeating bad search strategies does not need a motivational prompt about being rigorous. It needs adaptation infrastructure.

This is also where Cognaptus would draw a boundary between what the paper shows and what we infer.

The paper directly shows a conceptual architecture, a problem taxonomy, a design-pattern catalogue, and a qualitative ReAct redesign. It does not show measured ROI, reduced failure rates, lower latency, or improved benchmark performance. The business inference is that a system-theoretic vocabulary can improve architecture reviews, debugging, governance, and roadmap prioritisation. That inference is plausible, but it still needs deployment evidence.

In other words: the paper gives teams a better map. It does not prove the route is cheaper in every terrain.

The framework is strongest when agents become long-running systems

For small one-shot tasks, this architecture may feel heavy. If the “agent” simply summarizes a document and sends no external action, then a full architecture with Recorder, Reflector, Controller, Skill Build, and Coordinator may be unnecessary. Not every bicycle needs an aircraft maintenance manual.

The framework becomes more valuable as agentic systems become:

Long-running: They must preserve state, resume tasks, and avoid replaying past errors.
Tool-using: They can affect external systems, so execution must be standardised and auditable.
Multi-step: Failure can compound across chained reasoning and action.
Adaptive: Outcomes should change future behaviour, not merely fill a log file nobody reads.
Collaborative: Multiple agents or humans must coordinate without message chaos.
Governed: Business rules, ethical constraints, and accountability must be monitored continuously.

This explains why the paper’s mechanism-first structure matters. The more an agent resembles a persistent operating system rather than a chat session, the less acceptable it is to leave perception, state, action, feedback, and adaptation as implicit behaviours inside a prompt.

Prompting can guide behaviour. Architecture assigns responsibility.

That sentence should probably be printed on the wall of every “agent platform” war room, right next to the dashboard showing how often the agent silently retried the same broken action.

The limitation is not that the paper is conceptual; the limitation is what follows from being conceptual

The paper is explicit about its own boundaries. The framework is primarily conceptual. It needs quantitative benchmarking to measure whether the patterns actually improve reliability or efficiency against baselines. Sophisticated patterns such as Reflector and Controller can introduce architectural complexity and computational overhead. The paper also does not fully address broader societal issues such as accountability and emergent behaviour in large-scale autonomous systems.

Those limitations should not be treated as ceremonial caution. They affect how the paper should be used.

First, teams should not assume that adding more patterns always improves performance. A Reflector may improve learning, or it may add latency and produce noisy post-hoc rationalisations. A Controller may improve policy compliance, or it may become a brittle rule layer that blocks legitimate work. A Recorder may help state recovery, or it may preserve flawed state too faithfully. Architecture can reduce chaos; it can also industrialise it.

Second, the pattern names are not implementation specifications. “Tool Use” can mean many things: schema validation, permission boundaries, retries, sandboxing, tool selection policies, audit logs, dry-run modes, or human approval gates. The paper provides the architectural slot and intent, not the full production checklist.

Third, ReAct is a qualitative demonstration, not a benchmarked redesign. The proposed enhanced loop is persuasive as architecture, but the paper does not run the redesigned ReAct across task suites to show measured gains. That is acceptable for a framework paper. It is not acceptable as procurement evidence.

For business readers, the correct conclusion is disciplined adoption: use the framework to structure diagnosis and design, then test the resulting architecture under your own latency, cost, reliability, and compliance constraints.

From agent demos to agent engineering

The best part of this paper is that it makes agentic AI less mystical.

An agent failure is not always “the model is bad.” Sometimes the input was not grounded. Sometimes the world model was never externalised. Sometimes the tool layer had no recovery path. Sometimes feedback was observed but not converted into learning. Sometimes two agents were “collaborating” in the same way a group chat collaborates: loudly, asynchronously, and with no adult supervision.

The paper’s five subsystems and 12 patterns give teams a way to name these problems. Naming is not solving, but it is often the first moment when the problem stops being a fog bank and starts being engineering work.

The industry does not need fewer agent demos. Demos are useful. They reveal possibilities. But it does need fewer systems whose architecture is an inspirational prompt plus a vector store plus hope. Hope is not deprecated yet, but it should not be on the critical path.

Agentic systems need architecture because autonomy increases the cost of ambiguity. The more a system perceives, reasons, acts, learns, and communicates, the more each responsibility needs a home. That is the central lesson of the paper: patterns matter only when they are attached to mechanisms.

The next generation of agent systems will not be judged by how many agentic buzzwords appear in their architecture diagram. They will be judged by whether teams can explain where failures originate, how state is preserved, how actions are validated, how feedback changes behaviour, and how governance is enforced.

Less vibes. More anatomy.

Cognaptus: Automate the Present, Incubate the Future.

Minh-Dung Dao, Quy Minh Le, Hoang Thanh Lam, Duc-Trong Le, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D. Nguyen, “Agentic Design Patterns: A System-Theoretic Framework,” arXiv:2601.19752, 2026. https://arxiv.org/abs/2601.19752 ↩︎

The paper’s real move is from agent features to agent anatomy#

The cognitive cycle is the useful diagram hiding behind the pattern catalogue#

The 12 patterns are not a checklist; they are failure-specific repairs#

What the paper actually shows, and what it does not#

ReAct is the case study because it exposes the monolith problem#

The practical business value is cheaper diagnosis, not magic autonomy#

The framework is strongest when agents become long-running systems#

The limitation is not that the paper is conceptual; the limitation is what follows from being conceptual#

From agent demos to agent engineering#