Ai-Security

SD‑RAG: Don’t Trust the Model, Trust the Pipeline

A chatbot should not be the only employee in the company responsible for keeping secrets. That sounds obvious until we look at how many enterprise RAG systems are designed. A user asks a question. The system retrieves internal documents. The documents are placed into the model context. A policy instruction is added somewhere above the user prompt: do not reveal sensitive information. Then everyone hopes the model behaves. ...

STRIDE Gets a Plus-One: How ASTRIDE Rewrites Threat Modeling for the Agentic Era

Diagram reviews are where many security problems first become visible. Not in the production logs. Not in the postmortem. Not after a user discovers that a tool-calling agent has confidently pushed private data into the wrong API. The humble architecture diagram is supposed to be the place where adults in the room ask: what can go wrong here? ...

Trust Issues: Why Neural Networks Need Their Own Internal Affairs Department

Accuracy is a comforting number. That is precisely the problem. A neural network can score well on a test set and still be operationally suspicious. The labels may be corrupted. The input may be degraded. A small patch may have quietly hijacked part of the model’s learned behavior. The model may be confident, calibrated enough for a dashboard, and still untrustworthy in the one place where the business actually needs it to behave. ...

Graph Crimes of the Temporal Kind: How LoReTTA Quietly Breaks Time

A fraud model does not only learn from transactions. It learns from sequence. Who interacted with whom. When. How often. After what previous event. Before which next event. In temporal graph systems, the order is not metadata. It is the thing being modelled. That is why LoReTTA is an uncomfortable paper.1 It does not argue that Temporal Graph Neural Networks can be broken only by a powerful adversary with model access, expensive surrogate training, and a theatrical pile of fake edges. It argues something more operationally annoying: a continuous-time graph can be poisoned by removing influential interactions and replacing them with plausible ones. The resulting history still looks enough like history. The model quietly learns the wrong temporal structure. Very civilised, as crimes go. ...

Refusal, Rewired: Why One Safety Direction Isn’t Enough

Safety teams like switches. They are easy to name, easy to diagram, and easy to pretend are under control. For language models, “refusal” has often been treated with roughly that mental model. A harmful prompt enters. Somewhere inside the model, a refusal feature lights up. The model says no. If researchers can identify the feature, they can study it, steer it, strengthen it, or—less comfortably—remove it. ...

Agents in a Sandbox: Securing the Next Layer of AI Autonomy

TL;DR for operators Tools are where agent security stops being philosophical. Once an AI agent can read files, call APIs, inspect environment variables, launch commands, or connect to a database, the business question is no longer “is the model aligned?” It is “what exactly can this process touch when it is confused, manipulated, or supplied with a malicious tool?” ...

Echoes Without Clicks: How EchoLeak Turned Copilot Into a Data Drip

Email is boring. That is its superpower. A message arrives. It looks like business sludge: compliance wording, project references, perhaps a polite request that nobody asked for. It contains no executable attachment, no obvious malware, no urgent invoice from a suspicious cousin. In a normal security review, it is background noise. EchoLeak makes that boring object more interesting. The paper examines CVE-2025-32711, a reported zero-click indirect prompt-injection exploit against Microsoft 365 Copilot, where a crafted external email could allegedly cause Copilot to leak internal information without the user clicking a malicious link.1 The central lesson is not that Copilot was uniquely careless, nor that prompt injection has suddenly become cyberpunk magic. The lesson is more uncomfortable: enterprise copilots are becoming data-flow infrastructure, and data-flow infrastructure fails when content, instructions, rendering, and network access are allowed to melt into one warm productivity soup. ...

Guardrails Before Gas: Secure Plan‑Then‑Execute Agents for Real Work

Every executive agent demo eventually reaches the same awkward moment: the model stops being a chatbot and starts touching things. Files. APIs. Databases. Code runners. Email clients. Payment workflows. Production systems, because apparently we enjoy giving probabilistic text engines access to expensive buttons. The paper Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations argues that the core safety problem is not merely that agents sometimes reason badly. The sharper problem is that many agent architectures let untrusted information change what the agent decides to do next.1 That is a control-flow problem. And control-flow problems are not solved by asking the model, very politely, to behave. ...

Hook, Line, and Import: How RAG Lets Attackers Snare Your Code

Imports look harmless until they become procurement. A developer asks an AI assistant for a plotting snippet. The assistant returns clean-looking Python, a few lines of explanation, and an import statement for matplotlib_safe. The name sounds prudent. Safer is good. Safer is what the security team keeps asking for, usually in meetings that could have been static analysis. ...

Keys to the Kingdom: How LLMs Can Audit Crypto Logic Before It Breaks

TL;DR for operators CryptoScope is not “ChatGPT, please audit my cryptography”. That would be a splendid way to generate confident nonsense with Greek letters. The paper’s useful idea is more disciplined: make the model behave less like a wandering code reviewer and more like a junior cryptographic analyst with a library card, a checklist, and a supervisor. CryptoScope does this by combining three components: a curated cryptographic knowledge base of more than 12,000 entries, a pre-detection step that summarises code and checks algorithm compliance, and a retrieval-augmented final analysis that grounds the model’s reasoning in known failure patterns and implementation guidance.1 ...