Model Governance

Benchmarks That Fight Back: Adaptive Testing for LMs

A benchmark is supposed to be a measuring instrument. In practice, many AI benchmarks behave more like a tired clipboard. Every model gets the same questions. Every question receives the same accounting treatment. The final score is usually a mean accuracy number, neat enough for a leaderboard and blunt enough to hide the messy truth underneath. Some items are too easy to tell strong models apart. Some are too hard to tell weak models apart. Some are mislabeled. Some have stopped mattering because everyone competent now solves them. Yet the ritual continues: run the suite, average the answers, update the chart, pretend the thermometer is not melting. ...

Mirror, Signal, Manoeuvre: Why Privileged Self‑Access (Not Vibes) Defines AI Introspection

TL;DR for operators Dashboard lights are useful because they are wired into the machine. A sticker saying “probably fine” is less useful, even if the sticker was generated in a reassuring font. That is the practical distinction in this paper. Song, Lederman, Hu, and Mahowald argue that AI introspection should not mean “the model says something plausible about itself.” It should mean the model has privileged self-access: it can report an internal state more reliably than an outside evaluator using the same visible evidence at equal or lower computational cost.1 ...

Quants With a Plan: Agentic Workflows That Outtrade AutoML

TL;DR for operators A quant team does not need a chatbot that “has ideas” about markets. It needs a workflow that can select a sensible model, change one thing at a time, run the experiment, keep the better version, reject the worse one, and leave a paper trail that a human can inspect without requiring divination. ...

From Genes to Memes: The Evolutionary Biology of Hugging Face's 2 Million Models

TL;DR for operators Open-model adoption is usually treated as procurement with a nicer download button: find a model, check the licence, skim the model card, run a few benchmarks, and move on. This paper makes that habit look under-specified. Laufer, Oderinwale, and Kleinberg analyse 1.86 million models on Hugging Face and reconstruct family trees linking models to fine-tuned, adapted, quantised, or merged descendants.1 The useful result is not merely that Hugging Face is large. We knew that. The useful result is that model lineages mutate their public governance signals as they spread. ...

When AI Knows It Doesn’t Know: Turning Uncertainty into Strategic Advantage

TL;DR for operators A model that says “I don’t know” is not automatically trustworthy. It may be cautious. It may be badly calibrated. It may be uncertain for the wrong reasons. It may also be using uncertainty as a very elegant trapdoor. Polite refusal, unfortunately, is still refusal. Stephan Rabanser’s thesis, Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning, is useful because it treats uncertainty not as a philosophical mood, but as an operational control layer.1 The key question is not whether a model can emit a confidence score. Most models can emit something confidence-shaped. The harder question is whether that score can decide which cases should be automated, deferred, reviewed, rejected, routed to a larger model, or audited. ...

Open-Source, Open Risk? Testing the Limits of Malicious Fine-Tuning

TL;DR for operators Open-weight model safety is not just a question of what the released model refuses to answer. Once weights are public, the more relevant question is what a capable actor can make the model do after post-training. That is the problem this paper tackles. The paper introduces malicious fine-tuning as a release-evaluation method: take the model, assume a sophisticated adversary with serious reinforcement-learning infrastructure, and try to elicit the maximum dangerous capability in high-risk domains. The authors apply this to gpt-oss-120b, focusing on biology and cybersecurity rather than self-improvement. ...

Merge Without Mayhem: How Orthogonal Deltas Could Revolutionize Model Composition

TL;DR for operators Model composition usually sounds harmless until someone asks the obvious production question: “Can we remove that client-specific update without retraining the whole thing?” At that point, many elegant AI stacks quietly become sedimentary rock. The MDM-OC paper proposes a cleaner model lifecycle: keep a shared base model, express every fine-tuned specialist as a task delta, orthogonalize those deltas so they interfere less, merge them with tuned coefficients, and subtract a selected delta later when a capability, customer, or data source needs to be removed.1 The important claim is not “we found another averaging recipe.” The claim is that model updates can be treated as separable components in parameter space. ...

Forgetting by Remembering: A Smarter Path to Machine Unlearning

TL;DR for operators Deletion sounds simple until the deleted record has already shaped millions of model parameters. The clean answer is to retrain the model without that record. The operational answer is usually less glamorous: nobody wants to burn a full training cycle every time a user, regulator, data-quality team, or security analyst says, “Remove this.” ...

When Your AI Disagrees with Your Portfolio

TL;DR for operators An AI investment assistant does not enter every portfolio discussion as a blank analyst. The paper behind this article shows that large language models can carry latent investment preferences: for certain sectors, for larger companies, and for contrarian rather than momentum arguments.1 The important mechanism is simple and uncomfortable. When buy and sell evidence are balanced, the model’s internal prior can break the tie. When counter-evidence later becomes stronger, that prior does not necessarily disappear. In mixed-evidence settings, the model may latch onto the fragment of evidence that supports its original inclination and discount the stronger opposing side. Splendid. Your “neutral” analyst has discovered confirmation bias and brought it to the investment committee. ...

Structure Matters: Externalities and the Hidden Logic of GNN Decisions

TL;DR for operators GraphEXT is not another attempt to colour a few nodes and declare the model “interpretable”. It makes a sharper claim: in graph neural networks, a node’s importance is partly created by the structure around it. The same node may matter differently when its neighbours, subgraphs, and coalition boundaries change. ...