Governance

Context Is the New Attack Surface

Context Is the New Attack Surface A policy can block a sentence. It has a harder time blocking a story. That is the uncomfortable lesson from Jailbreak Mimicry, a recent arXiv paper by Pavlos Ntais on automated discovery of narrative-based jailbreaks for large language models.1 The paper trains a compact attacker model to transform harmful goals into plausible narrative or functional contexts, then tests whether larger models still produce harmful output. The headline number is easy to quote: the trained attacker reaches 81.0% attack success against GPT-OSS-20B on a held-out 200-item test set. The business lesson is less flashy and more useful: safety failures may not live in the forbidden content alone. They often live in the surrounding work story that makes the request look legitimate. ...

The Reward Is in the Room: Why AI Automation Needs Better Judgment, Not Just Bigger Models

Opening — Why this matters now AI adoption has entered its second, less glamorous phase. The first phase was easy to explain: make the model generate things. Emails, reports, code, dashboards, summaries, customer replies, compliance drafts, market notes, training content. Give the machine a prompt, admire the fluent output, and pretend the future has arrived because the paragraphs are well-spaced. ...

AI Access Control, Logging, and Retention Policies

How to design access controls, prompt/output logging, and retention rules for AI systems so governance remains practical, auditable, and proportional to risk.

AI Evaluation, Monitoring, and Incident Response for Production Systems

How to evaluate, monitor, and respond to failures in production AI systems so quality, safety, and governance remain active after launch.

AI Vendor Risk Assessment and Procurement Checklist

How to evaluate AI vendors before rollout, using a practical checklist for data handling, governance, contract risk, security posture, and operational fit.

How to Design Human Review for AI Systems

How to build a risk-tiered human review model so oversight is meaningful, efficient, and matched to business impact rather than added as a vague slogan.

When Not to Send Data to a Public LLM

How to decide when a business workflow should avoid public LLM endpoints, based on data sensitivity, contractual exposure, and safer design alternatives.

When Alignment Is Not Enough: Reading Between the Lines of Modern LLM Safety

Opening — Why this matters now In the past two years, alignment has quietly shifted from an academic concern to a commercial liability. The paper you uploaded (arXiv:2601.16589) sits squarely in this transition period: post-RLHF optimism, pre-regulatory realism. It asks a deceptively simple question—do current alignment techniques actually constrain model behavior in the ways we think they do?—and then proceeds to make that question uncomfortable. ...

Judging the Judges: When AI Evaluation Becomes a Fingerprint

Opening — Why this matters now LLM-as-judge has quietly become infrastructure. It ranks models, filters outputs, trains reward models, and increasingly decides what ships. The industry treats these judges as interchangeable instruments—different thermometers measuring the same temperature. This paper suggests that assumption is not just wrong, but dangerously so. Across thousands of evaluations, LLM judges show near-zero agreement with each other, yet striking consistency with themselves. They are not noisy sensors of a shared truth. They are stable, opinionated evaluators—each enforcing its own private theory of quality. ...

Thinking Without Understanding: When AI Learns to Reason Anyway

Opening — Why this matters now For years, debates about large language models (LLMs) have circled the same tired question: Do they really understand what they’re saying? The answer—still no—has been treated as a conversation stopper. But recent “reasoning models” have made that question increasingly irrelevant. A new generation of AI systems can now reason through problems step by step, critique their own intermediate outputs, and iteratively refine solutions. They do this without grounding, common sense, or symbolic understanding—yet they still solve tasks previously reserved for humans. That contradiction is not a bug in our theory of AI. It is a flaw in our theory of reasoning. ...