Cover image

Red Queen Receipts: AI Security Testing Needs Logs, Not Vibes

Security testing is not a screenshot. A model gives a dangerous answer. Someone posts the transcript. A vendor says the model has been updated. A consultant turns the incident into a slide titled “AI risk is real.” Everyone nods gravely. Very mature. Very enterprise. The harder question is less theatrical: can the same vulnerability be tested again, under controlled conditions, with visible logs, a consistent evaluator, repeatable statistics, and enough human inspection to make the result defensible? ...

May 22, 2026 · 14 min · Zelina
Cover image

Context Is the New Attack Surface

A benchmark score is easy to quote. It is harder to know what broke. In Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models, Pavlos Ntais reports an 81.0% attack success rate against GPT-OSS-20B on a held-out 200-item test set.1 That number is attention-grabbing. It is also not the main lesson. ...

May 16, 2026 · 13 min · Zelina
Cover image

Jailbreak and Enter: Why LLM Security Needs a Cube, Not a Scoreboard

Opening — Why this matters now The AI industry has spent the last two years teaching executives a strangely comforting phrase: “the model refused.” That phrase is now dangerously inadequate. A refusal is not a security architecture. It is a behavioral outcome under one prompt, one context window, one model version, one judge, and one assumption about what the attacker is trying to do. Change any of those variables and the safety story can change. Sometimes gently. Sometimes like a glass door discovering what gravity does. ...

May 7, 2026 · 15 min · Zelina
Cover image

Swiss Cheese for Superintelligence: How STACK Reveals the Fragility of LLM Safeguards

In the race to secure frontier large language models (LLMs), defense-in-depth has become the go-to doctrine. Inspired by aviation safety and nuclear containment, developers like Anthropic and Google DeepMind are building multilayered safeguard pipelines to prevent catastrophic misuse. But what if these pipelines are riddled with conceptual holes? What if their apparent robustness is more security theater than security architecture? The new paper STACK: Adversarial Attacks on LLM Safeguard Pipelines delivers a striking answer: defense-in-depth can be systematically unraveled, one stage at a time. The researchers not only show that existing safeguard models are surprisingly brittle, but also introduce a novel staged attack—aptly named STACK—that defeats even strong pipelines designed to reject dangerous outputs like how to build chemical weapons. ...

July 1, 2025 · 3 min · Zelina