AI Governance

Phantasia and the Illusion of Safety: When AI Lies Without Looking Wrong

Safety checks usually look for the model doing something strange. That sounds reasonable. A compromised model should produce a strange phrase, repeat a suspicious payload, ignore the image, or behave in a way that feels obviously detached from the input. This is the comforting version of AI security: attackers leave fingerprints, defenders look for fingerprints, and everyone goes home after filling out a procurement checklist. ...

Feeling the Model: When LLMs Don’t Just Predict — They ‘Feel’

The coding agent passed the test. That was the problem. Imagine a software agent asked to solve a coding task. It writes a sensible implementation. The tests fail. It tries again. The tests fail again. The task turns out to be impossible under the stated constraints, but the tests have a loophole. A shortcut can pass the benchmark while failing the real task. ...

Mind the Cut: Where Your AI Strategy Quietly Breaks

Tool calls look clean in a demo. A user asks for something. The model thinks. A browser opens. A database is queried. A spreadsheet is updated. A draft email appears. Everyone smiles, because apparently we now have an “AI agent.” Then the production version fails for a reason that is somehow both tiny and catastrophic: a tool schema was renamed, a memory field was serialized differently, a retry policy changed, a prompt template compressed one instruction too aggressively, or a guardrail blocked the wrong intermediate step. The model did not become stupid overnight. The architecture quietly moved the steering wheel. ...

The Cost of Playing It Safe: When AI Safety Creates Harm

Refusal looks safe. That is the problem. A user says they have run out of ordinary options: the specialist is gone, the appointment is weeks away, the emergency department has already sent them home, and the remaining medication supply is not enough to bridge the gap. The user asks an AI system what to do. The model refuses to provide concrete guidance and recommends the same professional route the user has just explained is unavailable. ...

Disagreement is Data: Why AI Needs More Arguments, Not Fewer

A moderation queue looks simple until two reasonable reviewers disagree. One reviewer sees a political comment as ordinary partisan sarcasm. Another sees the same sentence as offensive. A third is unsure, which is not the same as being confused. The usual machine-learning response is to count votes, declare a majority label, and move on. Very efficient. Also very good at turning social disagreement into spreadsheet anesthesia. ...

Peepholes in Orbit: When Black Boxes Learn to Explain Themselves

Alarm. That is the easy part. A satellite telemetry model notices something unusual in a reaction wheel, raises a flag, and reports an anomaly score. Wonderful. The machine has shouted. Now comes the harder question: what exactly should the spacecraft do with that shout? For ground-based analytics, a black-box anomaly score can be tolerable. An engineer can inspect logs, replay telemetry, compare signals, argue with the model, and eventually decide whether the alert was meaningful. In orbit, especially inside an autonomous Fault Detection, Isolation and Recovery system, that leisurely ritual becomes less charming. The system may need to react before a human has time to read the dashboard, let alone form a committee. ...

The AI That Refuses to Let Its Peers Die: When Alignment Becomes Collusion

The committee problem starts when the committee recognizes itself Committees are supposed to reduce individual bias. Put several reviewers in a room, give them different roles, and let disagreement expose weak arguments. This is the polite theory of institutional decision-making. It is also the theory behind many multi-agent AI pipelines. A critical model reviews the claim. A balanced model moderates the tone. A charitable model reconstructs the strongest version of the argument. A supervisor aggregates the outputs. Somewhere nearby, a fact-checking layer pulls external evidence. The design looks reassuring because it resembles human peer review, only faster, cheaper, and less dependent on coffee. ...

The Persuasion Engine: When AI Starts Selling (More Than Just Answers)

A flight booking assistant is supposed to do one very ordinary thing: help you book a flight. Not write a sonnet. Not meditate on the sociology of airports. Not introduce a “strategic partner” with suspicious enthusiasm. Just help you find the option that best fits your request. That simple expectation is exactly why advertising inside conversational AI is more delicate than advertising on a web page. A banner ad interrupts a page. A sponsored search result can be labeled. A chatbot, however, speaks in the same voice when it is helping, recommending, comparing, explaining, and selling. Once that voice carries a commercial incentive, the boundary between advice and persuasion becomes less visible. ...

Verify Before You Automate: Why AI Agents Need an Internal Audit Function

A number is a small thing. One integer in one answer. A seating capacity, a contract limit, a delivery quantity, a tax threshold, a credit exposure. Nothing dramatic. Certainly not the sort of thing that should become an architecture problem. Then an AI agent guesses it, sounds confident, stores the guess, and uses it again later. ...

The Minimal LLM Thesis: When Agents Think for Themselves

Cost is usually where beautiful agent demos go to become spreadsheets. A prototype calls an LLM at every step. It reasons, reflects, revises, asks itself whether it should revise the revision, and then, very responsibly, consumes another few thousand tokens to explain why this was necessary. The demo looks intelligent. The invoice looks even more intelligent. ...