Diagram reviews are where many security problems first become visible. Not in the production logs. Not in the postmortem. Not after a user discovers that a tool-calling agent has confidently pushed private data into the wrong API. The humble architecture diagram is supposed to be the place where adults in the room ask: what can go wrong here?
For ordinary software, that question has a familiar vocabulary. STRIDE asks teams to look for spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege. It is not glamorous, but security frameworks are not supposed to be glamorous. They are supposed to keep expensive disasters boring.
Agentic AI makes boring harder.
A recent paper, ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications, argues that traditional threat modeling misses a new layer of risk: the part where software interprets instructions, carries memory, reasons across context, invokes tools, and collaborates with other agents.1 Its proposed answer is ASTRIDE: classical STRIDE plus an additional category, A for AI Agent-Specific Attacks, combined with an automated pipeline that reads architecture diagrams using fine-tuned vision-language models and then asks a reasoning LLM to synthesize the final threat model.
The paper’s strongest contribution is not that it claims to “automate security.” That phrase has been printed on enough vendor slides to deserve supervised visitation rights. The useful idea is narrower and better: if agentic systems create attack surfaces that appear in architecture diagrams, then security review should begin earlier, at the diagram stage, before the agent reaches production and discovers its personality as a liability.
STRIDE works until the system starts interpreting instructions
STRIDE was designed for software components, data flows, and trust boundaries. It asks good questions. Can an attacker impersonate something? Can data be modified? Can actions be denied later? Can information leak? Can service be disrupted? Can privileges be escalated?
Those questions still matter for agentic AI. A payment gateway called by an LLM agent can still leak data. A tool execution service can still be abused. A memory store can still become a privileged target. Nobody gets exempt from ordinary security just because a chatbot is standing nearby wearing a tiny orchestration hat.
But agentic systems add failure modes that do not fit cleanly into the old checklist. The paper names prompt injection, context poisoning, model manipulation, unsafe tool invocation, reasoning subversion, memory misuse, and opaque agent-to-agent communication as examples of risks that traditional methods do not natively capture. These are not just new labels for old bugs. They sit at a different layer.
A conventional API endpoint receives parameters. An agent receives instructions and may reinterpret the task. A conventional service reads stored state. An agent may treat stored memory as part of its reasoning context. A conventional workflow calls tools through deterministic logic. An agent may decide which tool to call, when, and with what intermediate assumptions. That decision layer is exactly where the security model becomes slippery.
ASTRIDE’s first move is therefore taxonomic: add AI Agent-Specific Attacks as a first-class category instead of forcing agent-specific risks into STRIDE’s existing six boxes. This matters because categories govern attention. If prompt injection is treated as a weird form of tampering, it may be reviewed only at the input boundary. If it is treated as an agent-specific attack, reviewers are more likely to trace its path through prompt processing, reasoning, memory, tool invocation, and downstream actions.
That is the plus-one in the title. Not a decorative plus-one. More like the additional guest who changes the seating plan because they brought a flamethrower.
The mechanism: from diagram to model predictions to reconciled threat model
ASTRIDE’s architecture has four main parts: a data lake, a consortium of fine-tuned vision-language models, a reasoning LLM, and agent-based orchestration.
The data lake stores architecture artifacts such as data flow diagrams, component diagrams, trust boundaries, threat labels, component descriptions, and mitigations. In the implementation described by the paper, the authors generate approximately 1,200 annotated Mermaid diagram records. Each record contains visual or diagrammatic structure plus labels for STRIDE and AI-agent-specific threats.
The VLM layer is then trained to interpret those diagrams. The paper describes a consortium of models, including Llama-Vision, Pix2Struct, and Qwen2-VL in the implementation section, while some figures and table text also mention Pixtral-Vision. That naming inconsistency is worth noticing, not because it destroys the idea, but because production security tooling lives and dies by documentation discipline. “Mostly clear” is fine for a prototype paper; it is less fine when auditors ask what exactly generated a finding.
The important mechanism is still clear. Each model independently analyzes a submitted diagram and produces structured threat observations. Those observations include threat type, affected component, severity, and mitigation. The outputs are then passed through an orchestration layer to a reasoning LLM, identified in the paper as OpenAI-gpt-oss, which synthesizes the model outputs into a unified threat assessment.
The paper’s pipeline can be read as three layers of work:
| Layer | What it does | Operational meaning |
|---|---|---|
| Diagram understanding | Reads architecture and data-flow structure | Converts visual system design into machine-readable security context |
| Threat detection | Maps components and flows to STRIDE plus AI-agent-specific risks | Produces candidate risks tied to specific modules |
| Reasoning synthesis | Reconciles multiple VLM outputs into a final threat model | Turns partial, overlapping detections into a prioritized review artifact |
This is why a mechanism-first reading is better than a feature summary. ASTRIDE is not merely “LLM + security.” It is a proposed division of labor. VLMs inspect diagrams. Multiple VLMs reduce dependence on a single model’s blind spots. A reasoning LLM compares and consolidates the candidates. Agents coordinate the process.
That division of labor is the real argument.
The “A” category makes agent risk visible at the right architectural points
The most practical part of ASTRIDE is not the model stack. Model stacks age quickly. The taxonomy may last longer.
Consider a simple agentic workflow: user input enters a prompt processor, moves into an LLM reasoning core, reaches a tool execution module, touches external APIs, and stores information in memory. A traditional STRIDE review can still ask whether the tool module permits privilege escalation or whether API responses disclose information. Useful, yes. Sufficient, no.
The AI-agent-specific category pushes reviewers to ask different questions:
| Agentic component | AI-agent-specific risk | Why STRIDE alone can under-specify it |
|---|---|---|
| Prompt processor or NLU module | Prompt injection | The attack is not only malformed input; it is instruction manipulation inside the model’s control surface |
| Reasoning core | Reasoning subversion | The failure may be a corrupted decision path rather than a corrupted file or request |
| Context or memory store | Context poisoning | Stored text can become future instruction, not merely future data |
| Tool execution module | Unsafe tool invocation | The agent may call legitimate tools for illegitimate inferred goals |
| Multi-agent communication | Inter-agent influence | One agent’s output may become another agent’s trusted context |
This is where ASTRIDE’s framework is useful for business teams. It gives security, product, and engineering people a shared vocabulary for discussing agent risks before they collapse into vague warnings like “LLMs can be unsafe.” Vague warnings are cheap. They are also useless at sprint planning.
A better review question is: “Can a malicious user inject instructions at the prompt processor that survive into the reasoning core and trigger unauthorized tool invocation?” That question is concrete. It has components, flows, controls, and owners.
The VLMs are detectors, not security experts in a box
The paper’s implementation uses a synthetically generated dataset of roughly 1,200 annotated Mermaid records representing data flow diagrams, component diagrams, and trust-boundary layouts. The dataset is split into two-thirds training, one-sixth validation, and one-sixth testing. Fine-tuning is performed with Unsloth and QLoRA, with the reported training time around 1,627 seconds, or 27.12 minutes. Peak reserved memory is reported at 14.605 GB, with actual training consumption of 5.853 GB.
Those numbers support feasibility. They suggest that the authors can adapt a VLM to a domain-specific diagram interpretation task without needing an absurdly large training operation. This matters because security teams do not usually have infinite GPU budgets lying around next to the incident response pizza.
But feasibility is not the same as deployment-grade validation.
The training curves in Figures 5 and 6 show declining training and validation losses, but the paper also notes that validation loss remains above training loss, with a loss ratio ranging from 1.0 to 3.0. The authors interpret this as signs of overfitting, especially around spikes, while also noting stabilization as training proceeds. That makes these figures a fine-tuning diagnostic, not a complete proof that ASTRIDE can generalize across messy real enterprise architectures.
The qualitative before-and-after examples are more interesting for understanding the mechanism. In Figure 7, before fine-tuning, Llama-Vision reportedly identifies only prompt injection at the prompt processor. After fine-tuning, it identifies prompt injection, context poisoning in the reasoning core, and unsafe tool invocation in the tool execution module, along with mitigations such as prompt sanitization, context integrity checks, and access control for API invocations.
Figure 8 gives a similar story for Pixtral-Vision: before fine-tuning, the model flags a narrower reasoning-subversion risk; after fine-tuning, it expands to prompt injection in the NLU module, context poisoning in contextual memory, and mitigations such as zero-trust prompt filtering, reasoning constraints, and memory hashing with provenance tracking.
That is meaningful, but it is qualitative. The paper does not provide the kind of external benchmark, precision-recall table, production false-positive analysis, or cross-domain stress test that would justify treating the system as an autonomous security authority. The right interpretation is: fine-tuning appears to make VLMs better at producing structured agentic threat hypotheses from diagrams. The wrong interpretation is: the VLMs have become security experts. They have not. They have become better interns with unusually good eyesight.
The reasoning LLM is there to reconcile, not magically certify
ASTRIDE’s final layer asks a reasoning LLM to synthesize the outputs from the VLM consortium. This is sensible. If three models independently inspect a diagram, they may disagree, overlap, or miss different pieces. A reasoning layer can consolidate candidate threats, remove duplicates, rank risks, and connect them to mitigations.
Figure 9 illustrates this role using a multi-agent collaboration architecture. The individual VLM outputs identify different threats: prompt injection affecting a planning agent, context poisoning through agent memory, and unsafe tool invocation by an executor. The reasoning layer then combines them into a fuller threat model covering planner manipulation, memory poisoning, and executor-side unsafe tool invocation.
That figure is best read as qualitative evidence of synthesis. It shows the intended role of the reasoning model: combining partial observations into a coherent threat narrative. It does not prove the final answer is always correct, nor does it measure whether the reasoning layer suppresses false positives or introduces new hallucinated risks.
There is also a small but telling documentation wrinkle: while the surrounding section discusses OpenAI-gpt-oss, the Figure 9 table appears to label the reasoning row as OpenAI-o3. The caption names OpenAI-gpt-oss. Again, this is not fatal. But it is a reminder that the paper is a prototype-oriented contribution. When the subject is automated security analysis, exact model identity is not a footnote detail. It affects reproducibility, governance, and procurement.
Here is a disciplined reading of the evidence:
| Paper element | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Table comparing prior frameworks | Comparison with prior work | ASTRIDE combines diagram-aware VLMs, fine-tuning, reasoning LLM synthesis, and consortium orchestration in one framework | It does not prove superior accuracy against all prior systems |
| Architecture diagrams | Implementation detail | The pipeline has a coherent division of labor across data lake, VLMs, agents, and reasoning LLM | It does not prove robustness under production architecture variation |
| Training and validation loss curves | Fine-tuning diagnostic | The VLM can be adapted to the diagram-threat task; training improves and stabilizes | It does not establish low false-positive or false-negative rates |
| Before/after VLM examples | Main qualitative evidence | Fine-tuning broadens threat coverage and mitigation specificity | It does not quantify performance across independent benchmarks |
| Reasoning LLM synthesis example | Qualitative synthesis test | The reasoning layer can consolidate multiple partial threat predictions | It does not certify final correctness or audit readiness |
That table is less exciting than “AI automates threat modeling.” It is also more useful, which is usually the trade-off.
The business value is shifting security review left
For companies building agentic AI systems, ASTRIDE points toward a practical workflow: scan architecture diagrams early, generate structured threat hypotheses, and use those outputs to guide human review before implementation hardens into production debt.
This has three business implications.
First, ASTRIDE-like tools could make agent security review more scalable. Many organizations are now building internal agents for customer support, finance operations, data retrieval, research workflows, and software engineering. Each agent introduces prompt surfaces, tool permissions, memory stores, API links, and sometimes multi-agent delegation. Manual review does not disappear, but it can be focused. An automated first pass can identify where reviewers should spend attention.
Second, diagram-driven analysis fits naturally into DevSecOps. The input is not a massive log archive or a running system. It is an architecture artifact teams already create, or at least claim to create before building the thing directly in production like civilized raccoons. If diagram review becomes part of design review, security moves closer to architecture decisions instead of arriving after the agent has already been wired into sensitive tools.
Third, the framework creates a governance language for AI-specific controls. Instead of arguing abstractly about “AI safety,” teams can map risks to components: prompt validation at the NLU layer, context integrity checks for memory, access control and allowlists for tool invocation, provenance tracking for shared state, and reasoning constraints for high-risk decisions.
The paper directly shows a prototype architecture and qualitative evidence that fine-tuning improves diagram-based threat predictions. Cognaptus’ business inference is more cautious: the near-term value is not automated assurance. It is cheaper diagnosis. ASTRIDE-like systems can become a triage layer that helps teams find likely weaknesses earlier and document them more consistently.
That is useful enough. Not every tool needs to arrive wearing a cape.
Where ASTRIDE should sit in an enterprise workflow
The paper describes ASTRIDE as an automated platform, but enterprises should resist the temptation to treat it as a replacement for security engineers. The safer deployment model is a human-in-the-loop review workflow.
A sensible version might look like this:
- Product and engineering teams submit architecture diagrams for agentic workflows.
- The tool identifies STRIDE and AI-agent-specific risks by component and data flow.
- Security reviewers inspect the generated threat model, accepting, revising, or rejecting findings.
- Accepted findings become control requirements, backlog items, or architecture changes.
- The reviewed threat model is stored as governance evidence.
In this workflow, ASTRIDE is not the judge. It is the analyst that prepares the case file.
This distinction matters because agentic AI security is context-sensitive. A memory store used for restaurant recommendations does not have the same risk profile as a memory store used for insurance claims, trading operations, or healthcare triage. A tool invocation pathway that is acceptable in a sandbox may be disastrous when it reaches payment execution. The diagram can reveal structure, but business risk depends on asset value, regulatory exposure, permissions, and incident impact.
That is where humans still earn their snacks.
The boundaries are clear: promising prototype, not audit-grade proof
The paper’s limitations are not minor, but they are also not embarrassing. They are the normal limitations of an early platform paper.
The dataset is partly synthetic and relatively small: approximately 1,200 annotated Mermaid records. Synthetic diagrams are useful for controlled fine-tuning, but real enterprise diagrams are inconsistent, incomplete, overloaded, and occasionally drawn by someone who believes arrows are a lifestyle choice. Generalization to those conditions remains uncertain.
The evaluation is qualitative-heavy. The before-and-after examples are persuasive as illustrations, but the paper does not provide enough quantitative evidence to support strong claims about operational accuracy. In security tooling, false negatives are dangerous because they create misplaced confidence; false positives are expensive because they waste expert attention. A production decision would need both measured.
The training diagnostics also deserve a sober reading. Validation loss exceeding training loss and loss ratios up to 3.0 suggest overfitting risk. That does not invalidate the fine-tuning result, but it limits how strongly one should claim generalization.
The model descriptions are not perfectly consistent across the paper. The implementation section names Llama-Vision, Pix2Struct, and Qwen2-VL; the qualitative examples discuss Llama-Vision and Pixtral-Vision; Figure 9 appears to mix OpenAI-gpt-oss and OpenAI-o3 labels. For a research prototype, this is survivable. For enterprise procurement, it would need cleanup.
So the correct conclusion is not “ASTRIDE solves agentic AI security.” It is this: ASTRIDE proposes a useful architecture for moving agent-specific threat modeling earlier in the lifecycle, and the prototype evidence supports feasibility. The remaining work is to prove reliability across real diagrams, varied domains, noisy architecture artifacts, and adversarially designed systems.
The real lesson: agent security needs architecture-native tools
ASTRIDE is valuable because it treats agentic AI as an architectural security problem, not merely a model behavior problem. That is the right direction.
Too much AI security discussion gets trapped at the prompt level: jailbreaks, refusals, filters, red-team examples, benchmark scores. Those matter, but enterprise agents fail across systems. They fail when prompts become plans, plans become tool calls, tool calls touch APIs, APIs change state, and memory stores preserve the wrong lesson for tomorrow. The threat is not just “bad text in, bad text out.” The threat is bad instruction traveling through a working business process.
That is why diagram-driven threat modeling is attractive. Architecture diagrams show where interpretation, memory, tools, and trust boundaries meet. ASTRIDE’s core idea is to teach multimodal models to read that structure and flag the new failure modes introduced by agency.
For business leaders, the takeaway is not to buy the first tool that claims to automate threat modeling. Please do not. Procurement teams have suffered enough.
The takeaway is to update the security review playbook. Agentic systems need threat categories that explicitly cover prompt injection, context poisoning, reasoning subversion, unsafe tool invocation, and inter-agent influence. They need diagram-level review before deployment. They need automated triage that assists experts instead of pretending experts are an inefficient legacy dependency.
STRIDE taught teams how to ask structured questions about software threats. ASTRIDE asks a more uncomfortable question: what happens when the software can interpret, remember, decide, and act?
That question will not go away. It will only get attached to more APIs.
Cognaptus: Automate the Present, Incubate the Future.
-
Eranga Bandara et al., “ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications,” arXiv:2512.04785, 2025. ↩︎