Model Risk

Graph Crimes of the Temporal Kind: How LoReTTA Quietly Breaks Time

A fraud model does not only learn from transactions. It learns from sequence. Who interacted with whom. When. How often. After what previous event. Before which next event. In temporal graph systems, the order is not metadata. It is the thing being modelled. That is why LoReTTA is an uncomfortable paper.1 It does not argue that Temporal Graph Neural Networks can be broken only by a powerful adversary with model access, expensive surrogate training, and a theatrical pile of fake edges. It argues something more operationally annoying: a continuous-time graph can be poisoned by removing influential interactions and replacing them with plausible ones. The resulting history still looks enough like history. The model quietly learns the wrong temporal structure. Very civilised, as crimes go. ...

Better Wrong Than Certain: How AI Learns to Know When It Doesn’t Know

A credit model approves the familiar applicant. A diagnostic model reads the common scan. A pricing model values the house in a neighbourhood it has seen a thousand times before. Everyone relaxes. The model is “confident”. Then a strange case arrives. The applicant has an unusual income pattern. The scan comes from an underrepresented patient group. The house sits outside the areas covered by historic transactions. The model still produces an answer, because that is what models are trained to do. Press button, receive number. Very efficient. Occasionally ridiculous. ...

FAITH in Numbers: Stress-Testing LLMs Against Financial Hallucinations

TL;DR for operators FAITH is useful because it changes the hallucination question from “Does the model sound right?” to “Can the model reconstruct a known financial number from the exact tables and surrounding text that justify it?”1 That sounds modest. It is not. In finance, modest is usually where the damage hides. ...

Curvature in the Jump: Geometrizing Financial Lévy Models

TL;DR for operators Jaehyung Choi’s paper does not offer a new trading strategy, volatility forecast, or backtest that makes the Sharpe ratio stand up and sing.1 Its contribution is more structural: it builds an information-geometric framework for Lévy processes, the family of stochastic processes often used when financial returns refuse to behave like polite Gaussian increments. ...

Signed, Sealed, Delivered: A Rough Path to Better Volatility Models

TL;DR for operators Options calibration has a familiar operational problem: the model that is fast enough to run every day is usually the model that assumes the market is behaving politely. The market, naturally, has other hobbies. This paper compares two ways of calibrating implied volatility surfaces. The first is the classical route: use model-specific analytical approximations for Heston and rough Bergomi. The second is the rough-path route: represent volatility as a linear functional of the truncated signature of a primary stochastic process.1 ...

OneShield Against the Storm: A Smarter Firewall for LLM Risks

TL;DR for operators Enterprise LLM safety is often discussed as if the main question is whether the model has been trained to “behave”. That is the comforting version of the story. It is also too small. IBM’s OneShield paper argues for a different operating model: treat safety as a separate, model-agnostic guardrail layer that sits around the LLM, runs multiple specialised detectors in parallel, and then applies explicit policy decisions through a separate policy manager.1 In plain business terms, OneShield is less like teaching the model good manners and more like installing a configurable safety-control plane around every AI interaction. Glamorous? Not especially. Operationally useful? Very much so. ...

From Sobol to Sinkhorn: A Transport Revolution in Sensitivity Analysis

TL;DR for operators Models rarely fail because nobody ran a sensitivity analysis. They fail because the sensitivity analysis answered the convenient question instead of the relevant one. The paper behind gsaot introduces an R package for Optimal Transport-based global sensitivity analysis.1 Its practical value is not that it makes Sobol’ indices obsolete. It does not. The useful shift is narrower and more interesting: gsaot estimates how much the entire output distribution changes when an input is known, rather than asking only how much of the output variance can be attributed to that input. ...

Raising the Bar: Why AI Competitions Are the New Benchmark Battleground

TL;DR for operators A model score is not a certificate. It is a timestamp. That is the operational message of D. Sculley and co-authors’ position paper on GenAI evaluation.1 Their argument is not that every static benchmark is useless, nor that competitions are magical truth machines with leaderboards attached. The argument is sharper: GenAI has broken the old bargain behind machine-learning evaluation. ...

Branching Out, Beating Down: Why Trees Still Outgrow Deep Roots in Quant AI

TL;DR for operators QuantBench is not another paper asking investors to believe that the newest neural architecture will finally decode markets because it has more layers and a nicer diagram. Mercifully. It is a benchmark platform for quantitative investment that tries to evaluate AI methods across the full quant workflow: factor mining, modelling, end-to-end position generation, portfolio optimisation, and order execution.1 ...