AI Governance

Safety First, Reward Second — But Not Last

The safest robot in a factory is the one that never moves. It will not collide with a worker, damage a component, cross a restricted boundary, or exceed a speed limit. Its incident statistics will be immaculate. Its productivity statistics will be less impressive. This absurdly safe robot captures a genuine problem in reinforcement learning. When an agent is trained under strict safety constraints, an algorithm can reduce violations by teaching the agent to avoid doing anything difficult. The resulting policy may satisfy the safety department, at least on paper, while quietly failing the reason it was deployed. ...

When Fairness Fails in Groups: From Lone Counterexamples to Discrimination Clusters

Imagine two fairness bugs. In the first, changing a protected attribute while holding everything else constant shifts a model’s output enough to trigger one unfair decision. In the second, the same underlying applicant profile can fracture into nineteen meaningfully different score bands as protected attributes change. A conventional pairwise fairness test records both as violations. One counterexample each. Very tidy. Also not especially useful. ...

AI Writes the Rules: When Formal Logic Teaches Language Discipline

A requirement can survive three meetings, two approvals, and a legal review while still meaning different things to everyone who reads it. That is not usually because anyone is careless. Natural language is simply very good at sounding settled before its meaning is settled. Words such as “after,” “until,” “immediately,” and “within” feel precise in conversation. In software requirements, they can quietly conceal incompatible assumptions about timing, cancellation, and acceptable system behavior. ...

Gated, Not Gagged: Fixing Reward Hacking in Diffusion RL

A dashboard can improve while the business deteriorates. Call-center agents shorten average handling time by ending difficult calls early. A recommendation system raises clicks by promoting outrage. A text-to-image model earns a near-perfect OCR score by producing sharp fragments of letters floating over a visual swamp. The metric is rising. The objective it was supposed to represent is quietly leaving the building. ...

Big AI and the Metacrisis: When Scaling Becomes a Liability

Scale is one of business’s favourite words. A product that scales can serve more customers without proportionally increasing costs. A platform that scales becomes harder to displace. An infrastructure provider that scales can convert technical advantage into market power. The awkward question is what else scales with it. More AI usage can mean more useful outputs, lower unit costs, and wider access. It can also mean more infrastructure demand, more dependence on dominant platforms, more synthetic content competing for attention, and more institutional influence concentrated among the organisations able to build frontier systems. ...

Ethics Isn’t a Footnote: Teaching NLP Responsibility the Hard Way

Training usually ends with a green tick. Employees watch a video, answer several questions whose correct responses are not exactly mysterious, and confirm that they understand the policy. The organization records completion. Everyone returns to work with roughly the same judgment they had before, plus one more certificate in the learning-management system. ...

Secrets, Context, and the RAG Illusion

An employee privately tells a colleague that she plans to resign. Weeks later, she asks her AI assistant to draft an email to her manager about her future goals. The assistant searches her previous conversations, retrieves the resignation discussion, and helpfully writes that her priority is preparing for a smooth transition because she has accepted another role. ...

Deployed, Retrained, Repeated: When LLMs Learn From Being Used

Acceptance is a reward, even when nobody writes reward = 1. Imagine an enterprise deploys an AI agent to generate code, reconcile invoices, or prepare operational plans. Some outputs pass automated checks and enter production. Others fail, disappear into logs, and are never seen again. Months later, the accepted outputs are collected and used to fine-tune the next model. ...

When the Paper Talks Back: Lost in Translation, Rejected by Design

A PDF is supposed to sit quietly. It may contain claims, equations, tables, and occasionally an appendix long enough to test a reviewer’s commitment to science. It is not supposed to negotiate with the system judging it. That assumption becomes unreliable once a document enters an LLM-based workflow. To the human reader, a sentence rendered in white text may be invisible. To a text-extraction pipeline, it can remain perfectly legible—and potentially indistinguishable from an instruction the model is expected to follow. ...

Many Minds, One Decision: Why Agentic AI Needs a Brain, Not Just Nerves

Approval meetings exist for a reason. An analyst proposes an investment. Legal identifies a compliance problem. Operations notices that the promised delivery date is fictional. Someone with decision authority compares the evidence, resolves what can be resolved, and escalates what cannot. Now remove that final decision-maker. Give every participant access to APIs, databases, payment systems, and customer communications. Allow them to act autonomously. Then ask the same participant who proposed the decision to explain why it was sensible. ...