AI Governance

Binding Obligations: Why AI Fails When the Relationships Slip

TL;DR for operators AI systems are getting better at producing outputs that look structured: code, CAD, diagrams, workflows, compliance memos, procurement recommendations, and decision traces. That is not the same as keeping the structure right. Two recent arXiv papers make this point from opposite ends of the problem. One looks inside language models and finds evidence for a compact retrieval-conditioned rebinding mechanism: the model does not necessarily rewrite its whole internal world after a state change; it can preserve old representations and redirect retrieval when the answer is needed.1 The other builds an engineering benchmark for Text-to-CAD and shows that models can pass earlier surface gates — executable code, plausible geometry — while still failing the practical tests of functionality, manufacturability, and assemblability.2 ...

Edge Control: Why Synthetic Graphs Need a Repair Pass

TL;DR for operators Synthetic graph data is easy to make look plausible and hard to make structurally right. A graph can have the right number of nodes, a sensible average edge count, and a respectable generative model behind it, while still getting the relational geometry wrong. In graph domains, that is not a cosmetic flaw. The edges are the thing. ...

Heads You Lose: Why Ablation-Reversible Interpretability Doesn’t Transfer

TL;DR for operators The paper is a useful slap on the wrist for anyone tempted to turn an interpretability result into an operational control too quickly.1 It asks a simple question: when an attention head looks important, contains readable information, and can restore model behaviour after ablation, does that mean it carries a transferable representation of the computation? ...

The Path of Least Assurance: Why AI Reliability Lives Between the Steps

TL;DR for operators AI reliability is increasingly a process problem, not an answer-checking problem. Three recent arXiv papers make that point from very different angles. MoCo-EA shows that adversarial examples are not merely isolated malicious pixels lurking in the shrubbery; they can lie along continuous, optimisable paths.1 ConceptAgent shows that erasing a concept from a diffusion model may disrupt the early text-to-image link while leaving later trajectory dynamics available for concept re-entry.2 BlueFin shows that LLM agents doing finance spreadsheet work fail in ways that only appear when you inspect formulas, recalculation behaviour, workbook mutations, tool choices, and whether the output helps a human analyst do useful work.3 ...

Flush Before You Trust: The Locality Trick Behind Incremental Sheaf Cohomology

TL;DR for operators Most business systems do not fail because they lack another dashboard. They fail because the dashboard is reading from a structure that changed three minutes ago, and nobody knows which part of the structure is now stale. Delightful. The paper behind this article proposes an incremental algorithm for maintaining first sheaf cohomology, $H^1$, on evolving 1-dimensional cellular complexes — essentially graph-like structures decorated with local vector spaces and consistency maps.1 In plainer operational language, it is about tracking whether a changing network of constraints still holds together without rebuilding the whole mathematical object after every edit. ...

Graph Work, Not Graph Worship: RAGA Turns RAG Into an Auditable Knowledge Operation

TL;DR for operators RAGA is not another “add a graph and accuracy goes up” paper. That would be too convenient, and therefore suspicious. The useful idea is more operational: treat retrieval-augmented generation as a knowledge management process, not a pile of embeddings with a polite chatbot on top. The paper proposes RAGA, short for Reading-And-Graph-building-Agent, an autonomous system that reads documents, searches existing graph knowledge, verifies whether new entities or relations should be added, and then constructs or updates a knowledge graph with source-linked provenance.1 Its core loop is Read–Search–Verify–Construct, implemented as a ReAct-style tool-calling agent rather than a one-shot extraction pipeline. ...

Logs Are Not Lineage: The Accountability Layer AI Agents Are Missing

TL;DR for operators The paper argues that trustworthy AI agents need more than accurate final answers. Once an agent can retrieve documents, call APIs, write memory, modify databases, send messages, or coordinate with other agents, trust depends on whether the organisation can reconstruct how the output or action happened. The useful mechanism is: ...

Stop Model Shopping: Build the AI Control Tower

TL;DR for operators AI deployment is no longer mainly a question of whether a model can produce something plausible. That problem has been solved often enough to become boring, which is usually when businesses start wasting money at scale. The live problem is control. Which model should be trusted on this workload? When should a system query another model, pay more, or stop? When an LLM produces an analytical “insight”, is it finding the pattern you care about, or merely discovering an aggregate confound wearing a nice blazer? ...

Cheap Seats, Sharp Eyes: Reward-Hack Detection Without the Frontier Judge

TL;DR for operators A frontier LLM judge is an expensive way to inspect every agent trajectory for reward hacking. This paper asks whether a much smaller detector can do most of that monitoring job at much lower cost. The answer is: yes, under the same information condition, and with important caveats. A 13.8M-parameter transformer encoder plus a logistic regression probe detects reward hacking in cleaned Terminal-Wrench trajectories with 0.9467 AUC and 0.8296 TPR@5%FPR. In the authors’ matched comparison, a reproduced gpt-5.4 judge reaches 0.9510 AUC and 0.7130 TPR@5%FPR on the cleaned sanitized-vs-baseline split.1 ...

The Chatbot Passed the Test. Then It Bowed Too Low.

TL;DR for operators NICE is useful because it does not ask whether a model has “social intelligence” as one grand, vaguely flattering trait. It breaks social intelligence into a diagnostic structure: 4 categories, 11 dimensions, 34 facets, and 137 Chinese-context ranking items. That matters because a model can look socially competent in aggregate while failing on the interaction behaviours that make or break real deployments. ...