Agents have become the new office intern, software engineer, analyst, compliance assistant, and occasional disaster rehearsal all in one. Give one a goal, some tools, a memory store, and permission to act, and it begins to look less like a chatbot and more like a small operating unit.
That is the sales pitch. The engineering reality is less tidy.
The useful point in Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions is not that agentic AI is growing quickly. Everybody with a slide deck and a budget request has discovered that. The paper’s sharper contribution is conceptual hygiene: it argues that the field is currently mixing two very different kinds of “agency” under one label, then wondering why its evaluations, risk controls, and architecture diagrams keep talking past one another.1
The two lineages are:
- Symbolic/Classical agentic AI, built around explicit rules, plans, state, logic, and verifiable decision procedures.
- Neural/Generative agentic AI, built around large language models, stochastic generation, prompt-driven orchestration, tool use, retrieval, and multi-agent conversation.
The paper’s central warning is that calling both of these “agents” is fine only if we remember they are not agents in the same way. One is closer to a disciplined rule-following process with explicit state. The other is closer to a generative coordinator improvising through context, tools, and probabilistic language. Same badge, different nervous system.
This matters because businesses are now making architecture, procurement, compliance, and liability decisions using vocabulary that is often one paradigm out of date. Charming. Dangerous. Very enterprise.
The mistake is treating LLM agents as classical agents with better manners
A common explanation of LLM agents borrows the old language of Belief–Desire–Intention systems or perceive–plan–act–reflect loops. The paper calls this conceptual retrofitting: applying symbolic-agent concepts to modern LLM systems whose mechanics are fundamentally different.
The distinction is not academic hair-splitting. In a symbolic system, a “belief” may correspond to an explicit state representation. A “plan” may be a structured sequence produced by a planner. A “rule” may be audited. A failure may be traced to a faulty condition, missing edge case, bad transition model, or flawed goal specification.
In an LLM-based agent, the same words often become metaphors. “Memory” may mean retrieved text or conversation state. “Planning” may mean a model generating a plausible sequence of steps. “Reflection” may mean another prompt asking the model to critique itself. “Coordination” may mean multiple model instances exchanging messages until the workflow reaches something that looks like progress.
That does not make neural agents fake. It makes them different.
The paper’s comparison is useful because it forces a replacement belief:
| Reader belief | Correction | Business consequence |
|---|---|---|
| “Agentic AI is one technology category.” | It is a family split between symbolic, neural, and hybrid mechanisms. | Vendor evaluation must ask what kind of agency is being purchased. |
| “LLM agents implement planning like classical agents.” | They often generate plans through stochastic orchestration rather than formal planning. | Planning reliability must be tested empirically, not assumed from the diagram. |
| “More autonomy means more intelligence.” | Autonomy only matters when paired with reliable control, state management, and evaluation. | Delegated workflows need governance proportional to action rights, not demo quality. |
| “Hybrid means safer by default.” | Hybrid systems inherit the failure modes of both paradigms. | Hybrid design needs dual audit trails, not a decorative rules layer. |
The paper is at its best when it resists the temptation to turn “agentic AI” into a spiritual category. It treats agency as an architectural and operational property. That is exactly how enterprise teams should treat it too.
Symbolic agents optimise for control; neural agents optimise for adaptability
The symbolic lineage is old, unfashionable in the current LLM carnival, and still annoyingly relevant. It includes explicit decision models such as Markov Decision Processes and Partially Observable Markov Decision Processes, along with cognitive architectures such as BDI and SOAR. These systems model states, goals, transitions, and rules in ways that can often be inspected or verified.
Their strength is not glamour. Their strength is control.
Symbolic systems are attractive when the environment is bounded, the rules are known, failures are expensive, and explainability is not optional. Clinical decision support, industrial control, robotics safety layers, regulatory logic, and formal compliance workflows all benefit from systems whose behaviour can be constrained and audited.
The weakness is equally obvious. Symbolic systems are brittle when the world refuses to stay inside the rulebook. They struggle with messy language, ambiguous context, unstructured data, and novel situations unless those possibilities were anticipated or carefully modelled.
The neural lineage flips the trade-off. LLM-based agents are strong precisely where symbolic systems often become painful: open-ended language, flexible tool use, document synthesis, data-rich analysis, conversational interfaces, and multi-step tasks where the route is not fully known in advance.
Modern frameworks such as LangChain, AutoGen, CrewAI, Semantic Kernel, LlamaIndex, and LangGraph belong to this world. Their agency comes less from explicit cognitive modelling and more from orchestration: prompt chaining, retrieval, tool calls, role-based workflows, multi-agent conversation, and context management.
The paper’s comparison of coordination mechanisms makes the split particularly clear.
| Design dimension | Symbolic/Classical agentic AI | Neural/Generative agentic AI |
|---|---|---|
| Main coordination mechanism | Algorithmic protocols, explicit plans, rule-governed interaction | Structured conversation, prompt orchestration, model-mediated routing |
| State management | Explicit and often inspectable | Often implicit, distributed across context windows, prompts, memory stores, and retrieved documents |
| Decision process | Deterministic or formally probabilistic | Stochastic generation of next action, response, or tool call |
| Flexibility | Lower; strongest in anticipated scenarios | Higher; can adapt to unfamiliar instructions and unstructured inputs |
| Verifiability | Higher; logic and state can often be audited | Lower; behaviour emerges from model outputs and orchestration traces |
| Failure style | Missing rule, flawed rule, edge-case brittleness, literal goal execution | Hallucination, prompt injection, context loss, goal drift, unstable tool use |
This is the article’s central comparison: symbolic agents are easier to trust when the world is narrow; neural agents are more useful when the world is messy. The mistake is asking either to be the other.
The paper’s evidence is a map, not a benchmark
The study is a systematic survey, not an experiment. That matters. There are no ablations showing that one architecture beats another by a certain percentage. No benchmark table proves that symbolic agents are “better” in healthcare or neural agents are “better” in finance. The evidence is taxonomic and synthetic.
The authors use a PRISMA-based review process. They report an initial pool of 165 records, reduce this to 120 after deduplication, exclude 42 during title and abstract screening, and identify 78 eligible full-text studies. They then add 12 foundational symbolic papers for historical and theoretical context, producing a final corpus of 90 publications.
That review design does two jobs. First, it gives the taxonomy more discipline than a normal opinion essay wearing a lab coat. Second, it creates boundaries: the paper can map patterns in the literature, but it cannot directly validate production performance across vendors, sectors, or deployments.
The paper’s figures and tables should therefore be read as follows:
| Paper component | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Historical timeline of AI paradigms | Background framing | Agentic AI emerged from multiple overlapping eras, not from one clean lineage | That the timeline alone explains current system performance |
| Dual-lineage conceptual framework | Main analytical claim | Symbolic and neural agentic systems differ in operating mechanics | That every real system fits neatly into one box |
| PRISMA flow diagram | Methodological evidence | The review followed a structured search and screening process | That the final corpus is exhaustive or immune to publication bias |
| Paradigm-aware taxonomy of 90 studies | Main synthesis evidence | Different domains and research areas cluster around different paradigms | That paradigm choice alone determines success |
| Governance tables | Interpretive framework | Risks differ by architecture and level of autonomy | That existing law already knows how to manage these risks |
| Future-direction table | Roadmap and inference | Hybrid neuro-symbolic systems are a plausible strategic direction | That hybrid systems are already mature or automatically safe |
This distinction keeps the paper in its proper lane. Its value is not empirical magnitude. Its value is classification discipline.
For business readers, that is still valuable. Many failed AI programmes do not fail because the model was too weak. They fail because the organisation misunderstood what kind of system it had built.
Domain choice is really constraint choice
One of the paper’s stronger practical claims is that domains do not choose architectures because of fashion. They choose them because of constraints.
Healthcare, legal workflows, finance, education, robotics, and scientific research all have different tolerance levels for uncertainty, delay, opacity, and failure. That shapes which paradigm is appropriate.
Healthcare favours symbolic or tightly constrained hybrid approaches when safety, privacy, and explainability dominate. Neural systems may still help with report generation, information retrieval, summarisation, or workflow assistance, but the paper notes that clinical contexts often contain them inside deterministic pipelines. The point is not that doctors hate flexibility. The point is that a flexible hallucination is still a hallucination, only now with institutional liability attached.
Finance looks more neural in the paper’s taxonomy because the sector often needs complex synthesis across dynamic, data-rich environments: fraud detection, market analysis, risk monitoring, sentiment analysis, and transaction patterning. Yet even there, the paper does not imply “pure stochastic freedom.” Finance still requires auditability, controls, and regulatory logic. Neural orchestration may perform the analysis; symbolic checks may define the guardrails.
Robotics and manufacturing appear as explicit hybrid cases. A drone, factory robot, or autonomous machine cannot be governed only by a persuasive conversation with itself. Physical safety requires constrained planning and reliable control. Neural components can help with perception, adaptation, and coordination, but symbolic or formal layers remain essential where bad decisions hit walls, people, or expensive equipment.
Education and customer-facing interaction lean more naturally toward neural systems because the value comes from adaptive dialogue, personalisation, and context-sensitive response. Rule-based tutoring can work for bounded exercises. It struggles when the user’s confusion arrives in natural language, as confusion rudely tends to do.
Legal and compliance systems occupy an awkward middle. They need the neural paradigm’s ability to process unstructured language at scale, but they cannot tolerate free-form invention. Hence the role of retrieval-heavy designs: neural systems can analyse and draft, but they must be grounded in verified corpora, jurisdictional context, and traceable sources.
The enterprise lesson is simple: do not start with “Should we use agents?” Start with “Which parts of this workflow require adaptability, and which parts require determinism?”
That question is less exciting than a demo. It is also more likely to keep the company out of court.
The architectural decision rule: split the workflow before choosing the agent
The paper’s dual paradigm can be turned into a practical design rule:
| Workflow condition | Preferred architecture | Reason |
|---|---|---|
| Rules are explicit, stakes are high, failures must be traceable | Symbolic or deterministic module | Auditability and predictable control matter more than creative flexibility |
| Inputs are unstructured, ambiguous, or language-heavy | Neural orchestration | LLMs are useful for synthesis, interpretation, and adaptive interaction |
| The task requires both exploration and enforcement | Hybrid neuro-symbolic design | Neural components explore; symbolic components constrain, validate, or explain |
| The system acts externally through APIs, payments, messages, trades, or controls | Constrained agent with explicit permissions and logs | Agency becomes operational risk once the system can act |
| The workflow must survive regulatory review | Paradigm-specific governance layer | Neural and symbolic failures require different evidence and controls |
This is where the paper becomes operationally useful. It suggests that businesses should stop evaluating agentic AI as a monolithic platform choice. The real decision is workflow decomposition.
A customer support agent, for example, may use a neural model to interpret customer intent, retrieve relevant policy, draft a response, and identify emotional tone. But refund eligibility, escalation thresholds, contractual exceptions, and regulated disclosures should probably remain symbolic or deterministic. The agent should not “creatively infer” whether a refund is legally required. That is not intelligence. That is an incident report preparing itself.
A financial research agent may use neural orchestration to gather filings, summarise market news, compare scenarios, and generate draft analysis. But portfolio constraints, restricted-list rules, leverage limits, client suitability checks, and trade approval boundaries need explicit logic. The model can assist judgment; it should not quietly become the investment committee.
A clinical documentation agent may summarise notes and structure reports, but diagnosis pathways, dosage constraints, contraindication checks, and escalation rules need auditable mechanisms. The softer the language model, the harder the control layer must be.
The pattern is consistent: use neural systems where interpretation is valuable; use symbolic systems where enforceability is mandatory.
Governance fails when it ignores the mechanism
The paper’s governance section is important because it rejects generic AI ethics as insufficient for agentic systems. “Transparency,” “fairness,” “safety,” and “accountability” do not mean the same thing across paradigms.
In symbolic systems, accountability often points toward designers, rule authors, state models, or missing edge cases. Transparency can be relatively high because the system may produce a trace of rule firings or logical steps. Bias may live in explicit rules or knowledge bases. Safety failures may emerge from a system executing a badly specified goal too literally.
In neural systems, accountability is messier. A failure may come from training data, prompt context, retrieval failure, tool misuse, model stochasticity, or adversarial input. Transparency is weaker because the model’s apparent reasoning is not the same thing as an inspectable causal chain. Bias is latent, distributed, and often revealed only under particular contexts. Safety failures include prompt injection, goal drift, hallucination, and value misgeneralisation.
So governance cannot simply demand “explainability” from every system in the same way. A symbolic system may support direct logic inspection. A neural system may need context logging, retrieval traces, prompt shielding, adversarial testing, confidence thresholds, and post-hoc explanation—with the uncomfortable caveat that post-hoc explanation is not the same as faithful internal reasoning.
The paper also adds a policy distinction by levels of agency:
| Agency level | Practical meaning | Governance implication |
|---|---|---|
| Assistive | The AI recommends or analyses; humans decide | Audit outputs, disclose limitations, preserve human decision authority |
| Shared | AI and humans jointly influence decisions | Log contributions, allocate roles clearly, define shared liability |
| Delegated | AI executes actions within a defined domain | Require strict boundaries, monitoring, override mechanisms, and stronger accountability |
This is especially relevant for enterprise deployment. A chatbot that drafts an email is one risk profile. An agent that sends the email, updates the CRM, approves a refund, triggers a payment, or changes a production schedule is another. Same interface, different liability universe.
The governance question should not be “Is there a human in the loop?” That phrase has been overworked into decorative compliance furniture. The better question is: What can the agent do without interruption, and what evidence exists after it does it?
Hybrid systems are the destination, not the shortcut
The paper’s future direction is hybrid neuro-symbolic architecture. That does not mean gluing a rules engine to an LLM and declaring victory in the architecture review. Hybrid systems are promising because they divide labour between two forms of intelligence:
- neural components for perception, language, synthesis, adaptation, and interaction;
- symbolic components for constraint checking, formal reasoning, state validation, compliance, and safety boundaries.
This is the right direction, but it is not a magic solvent. Hybrid systems can reduce the weaknesses of each paradigm only if the interface between them is engineered carefully.
A weak hybrid system lets the LLM generate outputs and then asks a symbolic layer to rubber-stamp them. A stronger hybrid system gives the symbolic layer real veto power, structured state, formal constraints, and auditable logs. The difference is not cosmetic. It is the difference between governance and theatre.
The paper points to several research gaps that matter for practical adoption:
| Gap | Symbolic challenge | Neural challenge | Hybrid opportunity |
|---|---|---|---|
| Evaluation | Testing logical robustness in open environments | Testing hallucination, prompt robustness, memory, and cost | Separate benchmark suites plus integrated workflow evaluation |
| Reasoning | Brittle outside anticipated rules | Pattern matching can imitate reasoning without guaranteeing it | Neural exploration constrained by symbolic validation |
| Memory | Persistent state can be explicit but hard to update flexibly | Context windows create amnesia and unstable continuity | External structured memory with controlled read/write behaviour |
| Interoperability | Hard to connect with messy real-world data | Easy to call tools but prone to misuse | Middleware connecting LLM agents to symbolic validators |
| Governance | Rule audits become complex at scale | Attribution gaps remain severe | Dual audit trails across logic and model behaviour |
For businesses, the hybrid lesson is less futuristic than it sounds. Most serious enterprise deployments will not be pure anything. They will be layered systems: LLMs for flexible interface and synthesis, retrieval for grounding, deterministic workflows for approvals, policy engines for constraints, human review for high-risk actions, and logging for accountability.
In other words, the future of agentic AI may look less like a synthetic employee and more like a regulated operations stack with a fluent front end. Not as sexy. Much more deployable.
What Cognaptus would infer for enterprise architecture
The paper directly shows a conceptual and literature-based split between symbolic and neural agentic AI, supported by a PRISMA-style review and paradigm-aware taxonomy. It does not directly show which vendor platform is best, which framework is most reliable, or which deployment will produce superior ROI.
The business inference is nevertheless clear.
Enterprises should evaluate agentic systems along three axes:
- Mechanism: Is the agent making decisions through explicit rules, probabilistic planning, LLM generation, retrieval, tool orchestration, or some hybrid?
- Autonomy: Can it recommend, decide jointly, or act independently?
- Evidence: What logs, traces, tests, constraints, and review mechanisms exist for the kind of failure this architecture is likely to produce?
That creates a better procurement and design conversation than “Do we need an AI agent?” The answer to that question is usually “maybe,” which is not a strategy. The better version is:
- Which workflow components require open-ended interpretation?
- Which workflow components require strict determinism?
- Which outputs must be explainable to regulators, customers, or internal risk teams?
- Which actions should never be taken without explicit approval?
- Which failures are acceptable, reversible, or catastrophic?
- Which parts of the agent’s behaviour can be tested before deployment, and which must be monitored continuously?
This is the practical value of the paper. It gives business teams a vocabulary for refusing bad abstractions. “Agent” is not enough. “LLM-powered” is not enough. “Multi-agent” is definitely not enough; sometimes that only means the system now has several ways to be confused.
The boundary: this is a survey, not proof of deployment superiority
The paper’s limitations matter because its taxonomy is useful but not final.
First, the field is moving quickly. The review’s contemporary search window extends to early 2025, and the authors acknowledge that fast-moving neural-agent developments may outrun any fixed literature snapshot.
Second, many state-of-the-art agentic systems are proprietary. That means architecture and performance details may be incomplete, selectively disclosed, or inferred from secondary documentation. Enterprise buyers should remember this when vendors present orchestration diagrams as if they were audited engineering evidence.
Third, the reviewed studies use heterogeneous evaluation methods. This limits direct cross-paradigm benchmarking. A symbolic planner evaluated for logical soundness and a neural agent evaluated for long-horizon task completion are not being measured with the same ruler.
Fourth, hybrid classification is inherently messy. Real systems may contain symbolic retrieval, deterministic workflow controls, neural reasoning, rule-based permissions, and human approval steps. Assigning a single paradigm label can simplify reality. Sometimes simplification clarifies; sometimes it hides the interesting bit.
These boundaries do not undermine the paper’s contribution. They define it. The paper is not a leaderboard. It is a map of operating logics.
The agent era needs less mythology and more architecture
The agentic AI conversation is drifting toward a familiar industry pattern: first anthropomorphise the software, then act surprised when governance becomes difficult. The paper provides a useful corrective. It asks us to stop treating “agency” as a mystical upgrade and start treating it as a design property produced by specific mechanisms.
Symbolic systems give us control, verification, and explicit state, but they can become brittle and narrow. Neural systems give us adaptability, language fluency, and tool orchestration, but they are opaque, stochastic, and vulnerable to context failure. Hybrid systems are the likely destination because real work needs both reliability and flexibility.
The business lesson is not that one mind wins. It is that the machine increasingly needs two: one to improvise, one to say no.
That second mind may be the more valuable one.
Cognaptus: Automate the Present, Incubate the Future.
-
Mohamad Abou Ali and Fadi Dornaika, “Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions,” arXiv:2510.25445, 2025, https://arxiv.org/abs/2510.25445. ↩︎