Model Governance

Lost in Translation: When Multilingual LLMs Miss the Medical Plot

Accuracy is a seductive number. It is tidy, executive-friendly, and easy to put in a slide deck. A model gets 82% accuracy, someone says “good enough,” and suddenly a clinical workflow is being “transformed.” Healthcare, as usual, has a way of punishing this kind of optimism. Not loudly at first. Quietly. Through false negatives, silent majority-class prediction, and a dashboard that looks reassuring until someone asks the rude question: what exactly did the model miss? ...

Counterfactuals, Concepts, and Causality: XAI Finally Gets Its Act Together

Explanations should answer the question people actually ask Audit meeting. A model has made a decision. Someone projects a heatmap. The highlighted pixels are around a chin, an eye, a forehead, or some other facial region that looks important because the model says it is important. Everyone nods carefully. Nobody is much wiser. The model has technically been “explained,” in the same way a smoke alarm explains fire by making noise. ...

Prototypes, Not Guesswork: Rethinking Trust in Multi‑View Classification

Pizza. The image says pizza. The text description says baklava. A human sees the contradiction immediately. A multi-view classifier may not. It may average the views, let one noisy modality dominate, or produce a confident answer from evidence that should have triggered suspicion. Very impressive, in the same way a committee can be impressive while approving the wrong invoice. ...

Counterfactuals Unchained: How Causality Escapes Its Own Models

A loan is rejected. Now explain why. A borrower is rejected by an automated lending system. The compliance team asks a simple question: What caused the rejection? A naïve answer points to a variable: low income, high debt ratio, thin credit history, missing documentation, or some equally respectable-looking field in the model. A better answer asks what would have happened if that variable had changed. A still better answer asks which surrounding facts must be held fixed while we imagine that change. ...

Trust Issues: Why Neural Networks Need Their Own Internal Affairs Department

Accuracy is a comforting number. That is precisely the problem. A neural network can score well on a test set and still be operationally suspicious. The labels may be corrupted. The input may be degraded. A small patch may have quietly hijacked part of the model’s learned behavior. The model may be confident, calibrated enough for a dashboard, and still untrustworthy in the one place where the business actually needs it to behave. ...

Value Collision Course: When LLM Alignment Plays Favorites

A support chatbot does not wake up one morning with a worldview. It gets one, slowly, through the dull machinery of product decisions: who labels the data, how many options they can choose from, whether disagreement is kept or ironed flat, and which optimization method gets the privilege of turning messy human judgement into model behaviour. ...

Refusal, Rewired: Why One Safety Direction Isn’t Enough

Safety teams like switches. They are easy to name, easy to diagram, and easy to pretend are under control. For language models, “refusal” has often been treated with roughly that mental model. A harmful prompt enters. Somewhere inside the model, a refusal feature lights up. The model says no. If researchers can identify the feature, they can study it, steer it, strengthen it, or—less comfortably—remove it. ...

Safety in Numbers: Why Consensus Sampling Might Be the Most Underrated AI Safety Tool Yet

A model generates an image. It looks ordinary. A horse in a meadow, a lighthouse in a storm, a bowl of oranges. Nothing dramatic. No obvious watermark, no visible glitch, no suspicious artefact screaming “please call the security team”. That is precisely the problem. Some AI failures are meant to be seen. Toxic text, obvious hallucinations, broken code, bizarre images with eight fingers and a cursed wrist. Those are the easy cases, relatively speaking. The harder cases are outputs that look fine while carrying something unsafe: a hidden message, a planted vulnerability, a backdoor trigger, or another payload that cannot be reliably detected by staring harder at the finished product. ...

Unpacking the Explicit Mind: How ExplicitLM Redefines AI Memory

Memory is useful until nobody can find where it lives. That, in miniature, is the operational problem with today’s language models. They can answer questions, imitate expertise, retrieve fragments of the past, and produce very confident nonsense with the composure of a senior consultant who has just discovered bullet points. But when a model gives a wrong factual answer, the organisation deploying it faces an awkward question: where, exactly, is that wrong fact stored? ...

Graphing the Invisible: How Community Detection Makes AI Explanations Human-Scale

Graphing the Invisible: How Community Detection Makes AI Explanations Human-Scale Auditors like lists. Models, inconveniently, do not behave like lists. A credit model may tell you that income mattered, education mattered, job type mattered, age mattered, and postcode-adjacent variables mattered. A fraud model may produce the same kind of feature ranking, only with device fingerprints and transaction timings instead of employment history. The dashboard looks satisfyingly crisp: bars, scores, explanations, probably a tasteful shade of corporate blue. Then the real question arrives: which of these variables are acting together? ...