Risk Management

The Big Red Button Is Not a Risk Model

TL;DR for operators A shutdown button is a control surface. It is not, by itself, a theory of risk. David Thorstad’s paper, Revisiting the shutdown problem, argues that a major premise in some AI existential-risk arguments has been treated with more confidence than the available arguments support: the claim that it is difficult to build competent agents that can be shut down before causing existential catastrophe.1 The paper does not say shutdown safety is solved. It says the most common routes to panic are underpowered. ...

Feedback Is the New Attack Surface

TL;DR for operators AI agents are not only vulnerable because someone can hide a bad instruction in an email, document, web page, Slack message, or tool output. They are vulnerable because attackers can now automate the search for bad instructions that work. That changes the security problem. A one-off prompt injection is annoying. An automated attack loop is strategic. It generates candidate injections, observes the agent’s response, scores partial progress, keeps the promising branches, and tries again. Very entrepreneurial, in the worst possible way. ...

Prompt and Order: Why LLM Trading Needs a Factory, Not a Fortune Teller

Orders are where trading systems stop sounding intelligent and start spending money. A model can narrate the market beautifully. It can explain momentum, liquidity, volatility regimes, inventory pressure, and the great moral tragedy of being early. None of that matters if the final system places the wrong limit order, sizes too aggressively, fills only in a fantasy simulator, or wins a backtest because it tried enough variants to accidentally find one that looked divine. ...

Blame the Blueprint: Why AI Risk Starts in the Architecture

AI risk reviews still tend to begin with comforting questions. Who is the responsible developer? What policy applies? What did the model output? Was the user allowed to ask that? Did the compliance team approve the deployment checklist? Useful questions, certainly. Also slightly late. Two recent arXiv papers point to a less convenient lesson: some AI risks are not merely produced by bad prompts, careless users, malicious deployment, or weak legal controls. They are produced by architecture. One paper shows this at the model-training layer, where Batch Normalization can amplify memorization of atypical samples and increase privacy leakage.1 The other shows it at the ecosystem layer, where decentralized AI can dissolve the very addressee that conventional governance assumes, forcing governance to move from policy instructions to protocol-level constraints.2 ...

When the Referee Wants to Be Nice: Hidden Bias in AI Judges

Audit. That is the word companies use when they want something to sound objective, disciplined, and preferably immune to politics. A model produces an answer. Another model evaluates it. The evaluator gives a verdict. Everyone gets a dashboard. The dashboard gets shown to management. Management nods, because dashboards have a calming effect on adults in conference rooms. ...

Benchmarking the Benchmarks: When AI Safety Metrics Stop Meaning Anything

Safety used to sound like a simple procurement question. A vendor says its model is safe. The slide deck has benchmark scores. The scores have respectable names: accuracy, F1, safety score, refusal rate, attack success rate. Everyone nods, because familiar metric names create the soothing illusion that someone has already done the hard work. ...

The Cost of Playing It Safe: When AI Safety Creates Harm

Refusal looks safe. That is the problem. A user says they have run out of ordinary options: the specialist is gone, the appointment is weeks away, the emergency department has already sent them home, and the remaining medication supply is not enough to bridge the gap. The user asks an AI system what to do. The model refuses to provide concrete guidance and recommends the same professional route the user has just explained is unavailable. ...

AgentHazard: Death by a Thousand ‘Harmless’ Steps

The dangerous part is the workflow A developer asks an AI agent to inspect a repository. The agent reads a config file. Normal. It checks a failing script. Normal. It edits a helper file. Still normal. It runs a command to verify the fix. Boringly normal. Then the accumulated workflow has copied sensitive variables, modified a dependency hook, or executed a command that no one would have approved if it had appeared as a single explicit request. ...

When RMSE Lies: Why Your AI Model Might Be Quietly Mispricing Risk

A forecast can be wrong in many ways. It can miss by a little. It can miss by a lot. It can be accurate on average while quietly underestimating rare but expensive outcomes. It can give a beautifully low RMSE while assigning laughably thin probability to the event that later eats the budget. This is the sort of mistake that looks harmless in a dashboard and expensive in a board meeting. ...

The Illusion of Anonymity: When AI Connects the Dots You Thought Were Safe

Anonymized data is still a story A customer log has no name. A research interview has no email address. A support transcript has placeholders where the direct identifiers used to be. Everyone relaxes. Compliance smiles politely. The spreadsheet is now “anonymous.” This is the small office ritual behind a very large assumption: if we remove direct identifiers, the remaining data becomes hard enough to link back to real people. ...