Model Evaluation

Bias on Demand: When Synthetic Data Exposes the Moral Logic of AI Fairness

Bias on Demand: When Synthetic Data Exposes the Moral Logic of AI Fairness In the field of machine learning, fairness is often treated as a technical constraint — a line of code to be added, a metric to be optimized. But behind every fairness metric lies a moral stance: what should be equalized, for whom, and at what cost? The paper “Bias on Demand: A Modelling Framework that Generates Synthetic Data with Bias” (Baumann et al., FAccT 2023) breaks this technical illusion by offering a framework that can manufacture bias in data — deliberately, transparently, and with philosophical intent. ...

Mirror, Signal, Manoeuvre: Why Privileged Self‑Access (Not Vibes) Defines AI Introspection

TL;DR Most demos of “LLM introspection” are actually vibe checks on outputs, not privileged access to internal state. If a third party with the same budget can do as well as the model “looking inward,” that’s not introspection—it’s ordinary evaluation. Two quick experiments show temperature self‑reports flip with trivial prompt changes and offer no edge over across‑model prediction. The bar for introspection should be higher, and business users should demand it. ...

Inside Out: How LLMs Are Learning to Feel (and Misfeel) Like Us

When Pixar’s Inside Out dramatized the mind as a control room of core emotions, it didn’t imagine that language models might soon build a similar architecture—on their own. But that’s exactly what a provocative new study suggests: large language models (LLMs), without explicit supervision, develop hierarchical structures of emotions that mirror human psychological models like Shaver’s emotion wheel. And the larger the model, the more nuanced its emotional understanding becomes. ...

Branching Out, Beating Down: Why Trees Still Outgrow Deep Roots in Quant AI

In the age of Transformers and neural nets that write poetry, it’s tempting to assume deep learning dominates every corner of AI. But in quantitative investing, the roots tell a different story. A recent paper—QuantBench: Benchmarking AI Methods for Quantitative Investment1—delivers a grounded reminder: tree-based models still outperform deep learning (DL) methods across key financial prediction tasks. ...