AI Governance

Unsafe at Any Bit: Patching the Safety Gaps in Quantized LLMs

TL;DR for operators Quantizing an LLM is not a harmless cost-saving step. It changes the model, and the paper analysed here shows that those changes can weaken safety even when familiar utility scores still look respectable. That is the uncomfortable part: the dashboard can say “performance preserved” while the model has become more willing to comply with harmful requests. Very efficient. Very modern. Very easy to miss. ...

The Conscience Plug-in: Teaching AI Right from Wrong on Demand

TL;DR for operators The paper’s central move is not “we trained a moral model.” It is “we inserted a referee between the agent’s plan and the agent’s action.” That distinction matters. If the architecture works, enterprises do not need to retrain every model whenever compliance, cultural norms, safety rules, or customer-specific constraints change. They can externalise those constraints into machine-readable constitutions and enforce them at runtime. ...

From Ballots to Bots: Reprogramming Democracy for the AI Era

TL;DR for operators AI political agents are best understood as a bandwidth upgrade for democratic participation, not as chrome-plated replacements for elected officials. The serious idea is not “let a chatbot run parliament”, which would be a fine way to make bad governance both faster and more confidently worded. The serious idea is that citizens, communities, and institutions may use AI delegates to process policy information, model preferences, negotiate trade-offs, and keep a continuous audit trail of representation. ...

The Art of Control: Balancing Autonomy, Authority, and Initiative in Human-AI Co-Creation

TL;DR for operators Most AI product debates still treat “control” as a single slider: more automation on the right, more human control on the left. Convenient, tidy, and wrong in exactly the way tidy models usually are. The MOSAAIC paper argues that control in human-AI co-creation has at least three separable dimensions: autonomy, or who can choose creative actions; initiative, or who can proactively contribute; and authority, or who can decide and direct the process.1 This matters because a system can be highly autonomous but still reactive, proactive but not authoritative, or authoritative in small tactical ways while leaving the human responsible for the final artifact. ...

From Cog to Colony: Why the AI Taxonomy Matters

TL;DR for operators Most organisations do not need “Agentic AI” because it sounds more advanced. They need the smallest reliable architecture that can complete the job without creating a private zoo of semi-autonomous software creatures. The paper behind this article, AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges, argues that AI Agents and Agentic AI are not interchangeable labels.1 An AI Agent is usually a bounded system: it interprets a task, calls tools, uses context, and produces an action or output. Agentic AI is a broader system pattern: multiple specialised agents coordinate, share memory, decompose goals, recover from failures, and work toward higher-level objectives. ...

Half-Life Crisis: Why AI Agents Fade with Time (and What It Means for Automation)

TL;DR for operators AI agents may not simply “get worse” on longer tasks. A better mental model is that every additional unit of human-equivalent task time adds another chance for the agent to fail. If that chance is roughly constant, success falls exponentially. That turns a cheerful benchmark number into a much less cheerful deployment number. Under Toby Ord’s constant-hazard interpretation of METR’s long-task data, an agent’s 50% success time horizon is its “half-life”: the point where half of attempts still succeed and half have already failed.1 The awkward part is what happens when a business needs 80%, 90%, or 99% reliability rather than a coin toss with better branding. ...

Scaling Trust, Not Just Models: Why AI Safety Must Be Quantitative

TL;DR for operators The paper’s practical message is simple enough to be uncomfortable: “use a smarter model to supervise the risky model” is not a safety strategy. It is an experiment waiting to be measured. Engels, Baek, Kantamneni, and Tegmark propose a way to measure scalable oversight as a two-player contest between a Guard and a Houdini.1 The Guard is the overseer: auditor, judge, monitor, containment system, or reviewer. The Houdini is the model trying to defeat oversight: deceive, persuade, insert a backdoor, or escape a simulated control environment. Each side receives a domain-specific Elo score, and the paper studies how that score changes as general model capability increases. ...

Logos, Metron, and Kratos: Forging the Future of Conversational Agents

TL;DR for operators Conversational agents are moving from polite text boxes into operational systems: booking, triaging, recommending, retrieving, judging, escalating, and occasionally making a confident mess with impressive formatting. The useful lesson from these two papers is simple: enterprise agents cannot be trusted just because they can reason, remember, or call tools. Those are necessary capabilities, not sufficient safeguards. A serious agent needs a fourth layer: a way to evaluate whether its own decisions and judgments deserve to be used. ...

When Smart AI Gets It Wrong: Diagnosing the Knowing-Doing Gap in Language Model Agents

TL;DR for operators A smart agent can still be a bad decision-maker. That is the useful, slightly annoying lesson from LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities.1 The paper studies Gemma2 models acting in simple decision environments and finds that they often fail not because they cannot describe the right strategy, but because they do not reliably execute it. ...

The Crossroads of Reason: When AI Hallucinates with Purpose

TL;DR for operators Do not ask, “Can the model do the task?” Ask, “Does the model use the capabilities it already has when the task becomes messy?” Hallucination is not one thing. In a medical, legal, financial, or investment workflow, it is a defect. In a labelled creative mode, it can be a feature. Revolutionary stuff: context matters. Goal-directedness is also not one thing. More goal pursuit can improve execution, but it also raises safety and governance questions. The sensible business pattern is not “deploy an autonomous AI analyst and hope it behaves”. It is mode governance: separate factual, creative, and decision-support modes with different metrics, interfaces, and controls. High-stakes workflows need scaffolding: memory, rule extraction, refinement loops, ensemble checks, scoring, audit trails, and humans who can edit policy rather than merely admire the model’s prose. AI products are currently being sold with a suspiciously convenient promise: one conversational interface will reason, search, write, create, decide, advise, analyse, and maybe spiritually support the quarterly planning meeting if procurement approves the invoice. ...