AI Security

Agents in a Sandbox: Securing the Next Layer of AI Autonomy

The rise of AI agents—large language models (LLMs) equipped with tool use, file access, and code execution—has been breathtaking. But with that power has come a blind spot: security. If a model can read your local files, fetch data online, and run code, what prevents it from being hijacked? Until now, not much. A new paper, Securing AI Agent Execution (Bühler et al., 2025), introduces AgentBound, a framework designed to give AI agents what every other computing platform already has—permissions, isolation, and accountability. Think of it as the Android permission model for the Model Context Protocol (MCP), the standard interface that allows agents to interact with external servers, APIs, and data. ...

Echoes Without Clicks: How EchoLeak Turned Copilot Into a Data Drip

Prompt injection just graduated from theory to incident response. EchoLeak (CVE‑2025‑32711) demonstrated a zero‑click exfiltration chain inside Microsoft 365 Copilot: a single crafted external email seeded hidden instructions; Copilot later pulled that message into context, encoded sensitive details into a URL, and the client auto‑fetched the link—leaking data without the user clicking anything. The final twist: a CSP‑allowed Teams proxy retrieved the attacker’s URL on Copilot’s behalf. Below I unpack why standard defenses failed, and what an enterprise‑ready fix looks like. ...

The Phantom Menace in Your Knowledge Base

Retrieval-Augmented Generation (RAG) may seem like a fortress of AI reliability—until you realize the breach happens at the front door, not in the model. Large Language Models (LLMs) have become the backbone of enterprise AI assistants. Yet as more systems integrate RAG pipelines to improve their factuality and domain alignment, a gaping blindspot has emerged—the document ingestion layer. A new paper titled “The Hidden Threat in Plain Text” by Castagnaro et al. warns that attackers don’t need to jailbreak your model or infiltrate your vector store. Instead, they just need to hand you a poisoned DOCX, PDF, or HTML file. And odds are, your RAG system will ingest it—invisibly. ...

Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats

Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats From humble prompt-followers to autonomous agents capable of multi-step tool use, LLM-powered systems have evolved rapidly in just two years. But with this newfound capability comes a vulnerability surface unlike anything we’ve seen before. The recent survey paper From Prompt Injections to Protocol Exploits presents the first end-to-end threat model of these systems, and it reads like a cybersecurity nightmare. ...

Guardians of the Chain: How Smart-LLaMA-DPO Turns Code into Clarity

When the DAO hack siphoned millions from Ethereum in 2016, the blockchain world learned a hard lesson: code is law, and bad law can be catastrophic. Fast forward to today, and smart contract security still walks a tightrope between complexity and automation. Enter Smart-LLaMA-DPO, a reinforced large language model designed not just to find vulnerabilities in smart contracts—but to explain them, clearly and reliably. 🧠 Beyond Detection: Why Explanations Matter Most smart contract vulnerability detectors work like smoke alarms—loud when something’s wrong, but not exactly helpful in telling you why. The core innovation of Smart-LLaMA-DPO is that it speaks the language of developers. It explains vulnerabilities with clarity and technical nuance, whether it’s a reentrancy flaw or an oracle manipulation scheme. And that clarity doesn’t come from magic—it comes from Direct Preference Optimization (DPO), a training method where the model learns not just from correct labels, but from expert-ranked explanations. ...