AI Governance

STRIDE Gets a Plus-One: How ASTRIDE Rewrites Threat Modeling for the Agentic Era

Diagram reviews are where many security problems first become visible. Not in the production logs. Not in the postmortem. Not after a user discovers that a tool-calling agent has confidently pushed private data into the wrong API. The humble architecture diagram is supposed to be the place where adults in the room ask: what can go wrong here? ...

Grounded or Just Confident? What the AI Consumer Index Reveals About Frontier Models

Shopping is where AI confidence goes to embarrass itself. Ask a frontier model for a gift, a replacement part, a budget-friendly product, or a game recommendation, and the answer often looks excellent. It is neatly formatted. It gives reasons. It may even include links and prices, because apparently nothing says “trust me” like a fabricated discount on a product page that no longer exists. ...

Scale Fail: How Downsampling Becomes an Adversarial Backdoor for VLMs

Scale Fail: How Downsampling Becomes an Adversarial Backdoor for VLMs Resize. It is one of those engineering verbs that sounds too boring to threaten anyone. A user uploads a screenshot, invoice, inspection photo, interface capture, medical form, or product image. The system resizes it. The model reads it. The workflow moves on. ...

Shift Happens: Detecting Behavioral Drift in Multi‑Agent Systems

Updates are boring until they are not. A retrieval index changes. A tool permission is adjusted. A base model is silently upgraded. A memory module starts carrying yesterday’s weird interaction into today’s customer support workflow. Nobody sees smoke. The dashboard still says “healthy.” The agent still answers. Then, three weeks later, someone notices that one group of agents has become strangely aggressive, risk-averse, evasive, or just less aligned with the behavior the product team thought it had shipped. ...

Flame Tamed: Can LLMs Put Out the Internet’s Worst Fires?

Flame Tamed: Can LLMs Put Out the Internet’s Worst Fires? A comment thread rarely explodes in one clean motion. It starts with a correction. Then someone reads the correction as condescension. Then another person adds a historical grievance, a screenshot, three exclamation marks, and the kind of moral certainty normally reserved for courtrooms and family dinners. By the time a moderator arrives, the thread is no longer a conversation. It is archaeology with insults. ...

Prompting on Life Support: How Invasive Context Engineering Fights Long-Context Drift

The prompt was clear. Then the conversation kept going. A familiar enterprise AI story starts politely enough. The legal assistant is told to be conservative. The medical triage bot is told not to diagnose. The procurement agent is told never to approve a vendor without documented checks. Everyone nods. The system prompt is immaculate. Compliance is laminated. ...

Stuck on Repeat: Why LLMs Reinforce Their Own Bad Ideas

Meetings have a familiar failure mode. Someone states an early opinion, then spends the next thirty minutes “thinking through the issue” in a way that somehow makes the original opinion look increasingly inevitable. Evidence enters the room. Counterarguments are acknowledged. The conclusion remains suspiciously loyal to the opening bid. Apparently, large language models have been attending the same meetings. ...

Rules of Attraction: How LLMs Learn to Judge Better Than We Do

Rubrics are supposed to make judgment boring. That is their charm. A good rubric tells a teacher why one essay deserves a 5 instead of a 3, tells a compliance reviewer why one response is acceptable and another is risky, and tells an internal QA team why a generated summary is useful rather than merely confident. In business, boring judgment is valuable. It scales. It can be audited. It survives employee turnover. It does not wake up one morning and decide that “clarity” now means “vibes with a semicolon.” ...

Anchors Aweigh? Why Small LLMs Refuse to Flip Their Own Semantics

A label looks harmless until you ask it to lie. Tell a model that a glowing movie review should be labeled POS, and few-shot prompting behaves like a useful intern: it studies the examples, picks up the pattern, and usually gets better. Tell the same model that a glowing review should now be labeled NEG, and the intern becomes less useful. It does not smoothly learn your private code. It does not politely invert its semantic universe. It mostly produces a muddle. ...

Trace Elements: Why Multimodal Reasoning Needs Its Own Safety Net

An answer can look safe and still leave fingerprints. That is the uncomfortable point behind GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision.1 The paper is not merely saying that multimodal models can be unsafe. We knew that. Congratulations, the fire is hot. Its sharper claim is architectural: once a model reasons over both images and text, the safety problem no longer lives only at the input or the final answer. It also lives in the middle. ...