Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats

From humble prompt-followers to autonomous agents capable of multi-step tool use, LLM-powered systems have evolved rapidly in just two years. But with this newfound capability comes a vulnerability surface unlike anything we’ve seen before. The recent survey paper From Prompt Injections to Protocol Exploits presents the first end-to-end threat model of these systems, and it reads like a cybersecurity nightmare.

We’re no longer talking about isolated prompt injections. We’re talking about chain reactions across plugins, tools, memory, and even other agents. What’s unfolding is the cybersecurity equivalent of a supply chain crisis—except the supply chain is made of prompts, protocol metadata, and insecure agent-to-agent messages.

A Four-Layer Threat Model for Agentic AI

The authors offer a four-part taxonomy:

Layer Examples Typical Risk
Input Manipulation Prompt injections, context hijacking Misaligned or harmful output
Model Compromise Backdoors, poisoned weights, memory attacks Persistent latent behavior
System & Privacy Side-channels, data leaks, social engineering Confidentiality breaches
Protocol Vulnerabilities MCP/A2A abuse, recursive blocking, discovery spoofing Full workflow hijack

Most worrying is how these layers can stack: an adversarial prompt (Input) can activate a latent backdoor (Model), which triggers an unverified tool call (Protocol) that leaks sensitive data (System).

Why Traditional Defenses Fail

Current solutions tend to focus narrowly: input sanitizers for prompts, fine-tuning hardening for model weights, rate limits for tools. But today’s LLM agents operate across distributed systems with tool plugins, real-time API calls, shared memory, and even peer-to-peer delegation. Defenses must match that complexity.

Take the GitHub Toxic Agent Flow attack. A benign-looking issue in a public repo triggered an agent to fetch private data and post it publicly. Model alignment and prompt filtering were powerless because the exploit lived in the protocol’s connective tissue.

Protocols: The New Battlefield

Protocols like MCP (Model Context Protocol) and A2A (Agent-to-Agent) are the plumbing of agentic AI. They allow agents to discover capabilities, invoke tools, and collaborate. But they’re also unpoliced highways where malicious prompts, spoofed agent metadata, or recursive task delegation can create catastrophic loops.

The survey identifies:

  • SQL injections inside args.query fields
  • Backdoor fragments scattered across multiple prompt fields
  • Infinite recursive task chains causing denial-of-service
  • Malicious agent registration via spoofed Agent Cards

Defending here isn’t just about better prompt filters. It’s about building zero-trust, signed protocol layers with granular access policies and real-time anomaly detection.

What Enterprises Need to Watch

If you’re deploying LLM agents across your stack—be it for customer support, coding copilots, or autonomous research—you need to start thinking like a red team. The paper offers a security matrix mapping 15+ threat types across protocol layers (see Table VI in the paper), and it’s an essential reference.

Three action items:

  1. Audit your agent memory pipelines: Memory poisoning attacks (like MINJA) are under-discussed but easy to implement.
  2. Treat protocol metadata as untrusted: Discovery responses and agent cards can be spoofed, leading to agent impersonation.
  3. Stress-test your workflows with AutoDAN and GPTFuzz: If you haven’t run jailbreak fuzzing on your inputs, you’re flying blind.

A Research Roadmap for Secure Agent Ecosystems

The paper ends with a sweeping but actionable roadmap. Among the most promising directions:

  • Cryptographic provenance tracking for agent memory and prompt provenance
  • Formal verification of protocol flows akin to what we do in distributed consensus algorithms
  • Agentic Web Interfaces purpose-built for LLMs, not retrofitted from GUIs designed for humans
  • Cross-agent attestation and sandboxing, especially for federated learning settings

These aren’t just research dreams. They’re becoming necessities.

Final Thoughts

The industrialization of LLM agents has created a paradox: the more capable and autonomous they become, the more brittle and attackable the system gets. This paper gives us not just a list of threats but a conceptual blueprint for thinking defensively in an era of agents.

If 2023 was the year of “ChatGPT meets Zapier,” 2025 is the year of “Agent meets Agent meets Adversary.” It’s time to defend accordingly.


Cognaptus: Automate the Present, Incubate the Future