Machine Learning

From Genes to Memes: The Evolutionary Biology of Hugging Face's 2 Million Models

When biologists talk about ecosystems, they speak of inheritance, mutation, adaptation, and drift. In the open-source AI world, the same vocabulary fits surprisingly well. A new empirical study of 1.86 million Hugging Face models maps the family trees of machine learning (ML) development and finds that AI evolution follows its own rules — with implications for openness, specialization, and sustainability. The Ecosystem as a Living Organism Hugging Face isn’t just a repository — it’s a breeding ground for derivative models. Pretrained models are fine-tuned, quantized, adapted, and sometimes merged, producing sprawling “phylogenies” that resemble biological family trees. The authors’ dataset connects models to their parents, letting them trace “genetic” similarity via metadata and model cards. The result: sibling models often share more traits than parent–child pairs, a sign that fine-tuning mutations are fast, non-random, and directionally biased. ...

Noise-Canceling Finance: How the Information Bottleneck Tames Overfitting in Asset Pricing

Deep learning has revolutionized many domains of finance, but when it comes to asset pricing, its power is often undercut by a familiar enemy: noise. Financial datasets are notoriously riddled with weak signals and irrelevant patterns, which easily mislead even the most sophisticated models. The result? Overfitting, poor generalization, and ultimately, bad bets. A recent paper by Che Sun proposes an elegant fix by drawing inspiration from information theory. Titled An Information Bottleneck Asset Pricing Model, the paper integrates information bottleneck (IB) regularization into an autoencoder-based asset pricing framework. The goal is simple yet profound: compress away the noise, and preserve only what matters for predicting asset returns. ...

From Molecule to Mock Human: Why Programmable Virtual Humans Could Rewrite Drug Discovery

The AI hype in pharma has mostly yielded faster failures. Despite generative models for molecules and AlphaFold for protein folding, the fundamental chasm remains: what works in silico or in vitro still too often flops in vivo. A new proposal — Programmable Virtual Humans (PVHs) — may finally aim high enough: modeling the entire cascade of drug action across human biology, not just optimizing isolated steps. 🧬 The Translational Gap Isn’t Just a Data Problem Most AI models in drug discovery focus on digitizing existing methods. Target-based models optimize binding affinity; phenotype-based approaches predict morphology changes in cell lines. But both ignore the reality that molecular behavior in humans is emergent — shaped by multiscale interactions between genes, proteins, tissues, and organs. ...

From Sobol to Sinkhorn: A Transport Revolution in Sensitivity Analysis

In a world where climate models span continents and economic simulators evolve across decades, it’s no longer enough to ask which variable affects the output the most. We must now ask: how does each input reshape the entire output distribution? The R package gsaot brings a mathematically rigorous answer, harnessing the power of Optimal Transport (OT) to provide a fresh take on sensitivity analysis. ...

Price Shock Therapy: Causal ML Reveals True Impact of Electricity Market Liberalization

When electricity markets were deregulated across many U.S. states in the 1990s, economists and policymakers hoped competition would lower consumer prices. But for decades, the results remained ambiguous—until now. A new paper, Causality analysis of electricity market liberalization on electricity price using novel Machine Learning methods, offers the most precise evaluation yet. Using cutting-edge causal machine learning models, the authors demonstrate that liberalization led to a 7% decrease in residential electricity prices in the short term—a finding with major implications for regulatory policy and infrastructure reform. ...

Trading on Memory: Why Markov Models Miss the Signal

Classic finance assumes that the past doesn’t matter — only the present state of the market matters for decisions. But in a new paper from researchers at Imperial College and Oxford, a kernel-based framework for trading strategy design exposes how this assumption leads to suboptimal choices. Their insight: memory matters, and modern tools can finally make use of it. ...