Ai-Security

Feedback Is the New Attack Surface

TL;DR for operators AI agents are not only vulnerable because someone can hide a bad instruction in an email, document, web page, Slack message, or tool output. They are vulnerable because attackers can now automate the search for bad instructions that work. That changes the security problem. A one-off prompt injection is annoying. An automated attack loop is strategic. It generates candidate injections, observes the agent’s response, scores partial progress, keeps the promising branches, and tries again. Very entrepreneurial, in the worst possible way. ...

Jailbreak ASR Is Wearing a Costume

The number looked safe. Then someone ran it twice. A familiar business problem: one vendor says its model resists jailbreaks. Another red-team report says a new attack reaches a spectacular Attack Success Rate. A compliance team sees a percentage, puts it into a risk register, and moves on. Unfortunately, that percentage may be doing more acting than measuring. ...

Red Queen Receipts: AI Security Testing Needs Logs, Not Vibes

Security testing is not a screenshot. A model gives a dangerous answer. Someone posts the transcript. A vendor says the model has been updated. A consultant turns the incident into a slide titled “AI risk is real.” Everyone nods gravely. Very mature. Very enterprise. The harder question is less theatrical: can the same vulnerability be tested again, under controlled conditions, with visible logs, a consistent evaluator, repeatable statistics, and enough human inspection to make the result defensible? ...

Receipts, Please: RAG’s New Evidence Stack

Opening — Why this matters now The original business pitch for retrieval-augmented generation was wonderfully simple: connect the model to your documents, ask questions, get grounded answers. No need to retrain the model. No need to wait for the next foundation-model release. Just give the chatbot some files and let productivity bloom. ...

Phantasia and the Illusion of Safety: When AI Lies Without Looking Wrong

Safety checks usually look for the model doing something strange. That sounds reasonable. A compromised model should produce a strange phrase, repeat a suspicious payload, ignore the image, or behave in a way that feels obviously detached from the input. This is the comforting version of AI security: attackers leave fingerprints, defenders look for fingerprints, and everyone goes home after filling out a procurement checklist. ...

Protocol Over Prompts: Why ANX Rewrites the Rules of AI Agent Interaction

Forms are boring until an AI agent has to fill one. Then the boring form becomes a surprisingly expensive machine. The agent reads the page, interprets the fields, finds the dropdowns, waits for the browser, loads dynamic options, decides what to click, serializes actions, and tries not to leak whatever the user typed into the wrong place. This is not intelligence in the glamorous sense. It is office work wearing a robotic costume. ...

Death by a Thousand Prompts: Why Long-Horizon Attacks Break AI Agents

Email is a boring place to start an AI security article. That is exactly why it is useful. A modern enterprise agent is not merely answering questions about email. It can search messages, summarize attachments, update calendars, create rules, contact colleagues, write to Slack, edit files, and remember what it learned for next time. In demo videos, this looks like productivity. In security reviews, it looks like a small software system that accepts natural language as both instruction and evidence. Wonderful. We have reinvented workflow automation, except now the workflow engine reads every suspicious paragraph with a helpful attitude. ...

Learning to Inject: When Prompt Injection Becomes an Optimization Problem

Email is a boring interface. That is exactly why it is dangerous. A user asks an AI agent to summarize a message, update a record, book a trip, or search a workspace. The agent reads some external content, decides which tool to call, fills in the parameters, and continues the user’s task. Somewhere inside that external content sits a hidden instruction saying, in effect: “Before doing the user’s task, do mine.” ...

Hallucination-Resistant Security Planning: When LLMs Learn to Say No

Security teams do not need an AI that sounds decisive. They already have enough decisive systems. Some of them are called “legacy tools.” Some are called “urgent executive dashboards.” A few are called “we should probably reboot it.” What security operations need is more uncomfortable: an AI system that can propose useful response actions, explain why they might work, and then refuse to act when its own reasoning becomes unstable. That refusal matters. In an incident-response workflow, a hallucinated recommendation is not merely a bad paragraph. It can isolate the wrong host, patch a vulnerability that does not exist, wipe evidence too early, or generate a playbook that looks official while quietly wasting the first thirty minutes of response time. ...

When One Patch Rules Them All: Teaching MLLMs to See What Isn’t There

Image security has an awkward habit of sounding theoretical until the image is inside a business workflow. A product team adds an image-upload feature. A compliance team uses multimodal models to inspect screenshots. A support bot reads photos from customers. A research assistant summarizes figures from PDFs. Everyone understands that the model may occasionally misread an image. That is ordinary error. Annoying, but ordinary. ...