Automation Risk

Context Is the New Attack Surface

A benchmark score is easy to quote. It is harder to know what broke. In Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models, Pavlos Ntais reports an 81.0% attack success rate against GPT-OSS-20B on a held-out 200-item test set.1 That number is attention-grabbing. It is also not the main lesson. ...

Jailbreak and Enter: Why LLM Security Needs a Cube, Not a Scoreboard

Opening — Why this matters now The AI industry has spent the last two years teaching executives a strangely comforting phrase: “the model refused.” That phrase is now dangerously inadequate. A refusal is not a security architecture. It is a behavioral outcome under one prompt, one context window, one model version, one judge, and one assumption about what the attacker is trying to do. Change any of those variables and the safety story can change. Sometimes gently. Sometimes like a glass door discovering what gravity does. ...

Memory, Bias, and the Mind of Machines: How Agentic LLMs Mislearn

TL;DR for operators Memory is becoming the fashionable upgrade for AI agents: let the system remember past tasks, extract lessons, and improve without retraining the model. Sensible. Also slightly dangerous, in the same way giving a junior analyst a notebook is useful until they start rewriting the notebook after every meeting. The important result is not that memory sometimes contains bad facts. Everyone who has used software, people, or software made by people already knew that. The sharper point is that useful experience can become faulty during the act of consolidation. When an LLM agent compresses raw trajectories into reusable textual lessons, it may strip away conditions, merge unlike cases, or turn a narrow success into a general rule. The memory then looks cleaner while becoming less true. Very enterprise. ...