Assurance

Death by a Thousand Prompts: Why Long-Horizon Attacks Break AI Agents

Opening — Why This Matters Now AI agents are no longer chatty interns. They book meetings, move money, browse the web, read inboxes, modify codebases, and increasingly act on behalf of humans in real systems. And that’s precisely the problem. While most safety research has focused on one-shot jailbreaks and prompt injections, real-world agents operate across time. They remember. They plan. They call tools. They update state. They accumulate context. ...

From Static Models to Living Systems: When AI Stops Predicting and Starts Adapting

Opening — Why This Matters Now The age of static AI is quietly ending. For years, we trained models once, deployed them, and hoped the world would behave. It rarely did. Markets shift. User behavior drifts. Regulations mutate. Data pipelines degrade. Yet most production AI systems still operate under a frozen-training assumption — a snapshot model navigating a moving world. ...

Lost in the Links: When World Knowledge Isn’t Enough

Opening — Why this matters now We are officially in the era of “agentic AI.” Models write code, browse the web, manage workflows, and increasingly promise autonomous decision-making. The marketing narrative suggests we are inches away from general-purpose digital operators. And yet, a deceptively simple game—navigating Wikipedia links from one page to another—exposes something uncomfortable. ...

Lost in Translation: When Safety Contracts Collapse Across 2.1 Billion Voices

Opening — Why this matters now If you evaluate AI safety only in English, under tightly structured output contracts, you may conclude that everything is under control. Indic Jailbreak Robustness (IJR) politely disagrees. The paper introduces a judge-free benchmark across 12 Indic and South Asian languages—representing more than 2.1 billion speakers—and evaluates 45,216 prompts under both contract-bound (JSON) and free-form (FREE) conditions. The conclusion is uncomfortable but precise: ...

Mind the Drift: Why Stateful AI Guardrails Beat Bigger Models

Opening — Why This Matters Now Multi-turn jailbreaks are no longer edge cases. They are the norm. As enterprises deploy LLMs into agentic workflows—customer support, RAG systems, tool-using copilots—the attack surface has shifted from blunt prompt injection to slow, deliberate intent grooming. No single turn looks dangerous. The danger is cumulative. This is the emerging Safety Gap: most guardrails remain stateless. They evaluate prompts in isolation. Attackers do not. ...

When Fine-Tuning Bites Back: The Hidden Safety Drift in Vision-Language Agents

Opening — Why this matters now Post-training is the new deployment phase. Foundation models are no longer static artifacts. They are continuously fine-tuned, adapted, domain-specialized, instruction-aligned, and re-aligned. In enterprise settings, this is framed as “customization.” In safety research, it is increasingly framed as something else: drift. A recent study demonstrates a disquieting result: fine-tuning a vision-language model on a narrow harmful dataset can induce broad, cross-domain misalignment—even on unrelated tasks. Worse, multimodal evaluation reveals substantially higher safety degradation than text-only benchmarks. ...

Diffusing the Periodic Table: How Hierarchy Fixes Molecular AI

Opening — Why This Matters Now Drug discovery is not suffering from a shortage of molecules. It is suffering from a shortage of valid ones. Generative AI has flooded chemistry with candidate structures, yet the quiet bottleneck remains chemical validity. One misplaced proton. One impossible valence. One aromatic nitrogen that refuses to be what the model thinks it is. The molecule collapses. ...

From PDE to Pipeline: When LLMs Become Numerical Architects

Opening — Why This Matters Now Scientific computing has a quiet gatekeeping problem. Partial Differential Equations (PDEs) power everything from climate modeling to semiconductor design. Yet building a reliable numerical solver still demands deep expertise in discretization, stability analysis, and debugging arcane implementation details. Neural approaches—PINNs, neural operators, foundation surrogates—promised liberation. Instead, they often delivered opacity. ...

Ready Player None: Why AI Still Can’t Beat the Human Game Multiverse

Opening — Why This Matters Now Every few months, a new model release arrives wrapped in confident headlines: human-level reasoning, expert-level coding, AGI within reach. Benchmarks light up. Leaderboards shift. Twitter celebrates. And yet, when these same models are asked to play a casual mobile game for two minutes — the kind designed for bored commuters — they collapse into hesitation, confusion, or paralysis. ...

Steer by Equation: When LLM Alignment Learns to Drive with ODEs

Opening — Why This Matters Now Activation steering has become the quiet workhorse of LLM alignment. No retraining. No RLHF reruns. Just a subtle nudge inside the model’s hidden states at inference time. Efficient? Yes. Principled? Not quite. Most steering methods rely on one-step activation addition: compute a direction vector, add it once, hope the model behaves. It works—until it doesn’t. Complex behaviors like truthfulness, helpfulness, and toxicity mitigation rarely live on clean linear boundaries. ...