Cover image

RxnBench: Reading Chemistry Like a Human (Turns Out That’s Hard)

Opening — Why this matters now Multimodal Large Language Models (MLLMs) have become impressively fluent readers of the world. They can caption images, parse charts, and answer questions about documents that would once have required a human analyst and a strong coffee. Naturally, chemistry was next. But chemistry does not speak in sentences. It speaks in arrows, wedges, dashed bonds, cryptic tables, and reaction schemes buried three pages away from their explanations. If we want autonomous “AI chemists,” the real test is not trivia or SMILES strings — it is whether models can read actual chemical papers. ...

December 31, 2025 · 4 min · Zelina
Cover image

The Invariance Trap: Why Matching Distributions Can Break Your Model

Opening — Why this matters now Distribution shift is no longer a corner case; it is the default condition of deployed AI. Models trained on pristine datasets routinely face degraded sensors, partial observability, noisy pipelines, or institutional drift once they leave the lab. The industry response has been almost reflexive: enforce invariance. Align source and target representations, minimize divergence, and hope the problem disappears. ...

December 31, 2025 · 4 min · Zelina
Cover image

When Models Forget on Purpose: Why Data Selection Matters More Than Data Volume

Opening — Why this matters now The AI industry has spent the last three years chanting a single mantra: more data, bigger models. It worked—until it didn’t. Performance gains are slowing, training costs are ballooning, and regulators are starting to ask uncomfortable questions about memorization, leakage, and data provenance. The paper you just uploaded steps directly into this tension and makes a slightly heretical claim: what we remove from training data may matter more than what we add. ...

December 31, 2025 · 3 min · Zelina
Cover image

When the Paper Talks Back: Lost in Translation, Rejected by Design

Opening — Why this matters now Academic peer review is buckling under scale. ICML alone now processes close to ten thousand submissions a year. In response, the temptation to insert LLMs somewhere into the review pipeline—screening, triage, or scoring—is understandable. Efficiency, after all, is a persuasive argument. Unfortunately, efficiency is also how subtle failures scale. This paper asks an uncomfortable but necessary question: what happens when the paper being reviewed quietly talks back to the model reviewing it? Not loudly. Not visibly. Just enough to tip the scales. ...

December 31, 2025 · 4 min · Zelina
Cover image

When the Tutor Is a Model: Learning Gains, Guardrails, and the Quiet Rise of AI Co‑Tutors

Opening — Why this matters now One‑to‑one tutoring is education’s gold standard—and its most stubborn bottleneck. Everyone agrees it works. Almost no one can afford it at scale. Into this gap steps generative AI, loudly promising democratized personalization and quietly raising fears about hallucinations, dependency, and cognitive atrophy. Most debates about AI tutors stall at ideology. This paper does something rarer: it runs an in‑classroom randomized controlled trial and reports what actually happened. No synthetic benchmarks. No speculative productivity math. Just UK teenagers, real maths problems, and an AI model forced to earn its keep under human supervision. fileciteturn0file0 ...

December 31, 2025 · 4 min · Zelina
Cover image

MIRAGE-VC: Teaching LLMs to Think Like VCs (Without Drowning in Graphs)

Opening — Why this matters now Venture capital has always been a strange mix of narrative craft and network math. Partners talk about vision, conviction, and pattern recognition, but behind the scenes, outcomes are brutally skewed: most startups fail quietly, a few dominate returns, and almost everything depends on who backs whom, and in what order. ...

December 30, 2025 · 4 min · Zelina
Cover image

NeuroSPICE: When Circuits Stop Ticking and Start Thinking

Opening — Why this matters now Circuit simulation has always been an exercise in controlled compromise. We discretize time, linearize nonlinearity, and hope the numerical solver behaves. SPICE has done this extraordinarily well for decades—but it was built for an era where devices were mostly electrical, mostly local, and mostly cooperative. That era is ending. Ferroelectrics, photonics, thermal coupling in 3D ICs, and other strongly nonlinear or multi-physics effects are turning compact modeling into a brittle art. Against this backdrop, NeuroSPICE proposes something mildly heretical: stop stepping through time altogether. ...

December 30, 2025 · 3 min · Zelina
Cover image

Regrets, Graphs, and the Price of Privacy: Federated Causal Discovery Grows Up

Opening — Why this matters now Federated learning promised a simple trade: keep data local, share intelligence globally. In practice, causal discovery in federated environments has been living off a polite fiction — that all clients live in the same causal universe. Hospitals, labs, or business units, we are told, differ only in sample size, not in how reality behaves. ...

December 30, 2025 · 4 min · Zelina
Cover image

Replay the Losses, Win the Game: When Failed Instructions Become Your Best Training Data

Opening — Why this matters now Reinforcement learning for large language models has a dirty secret: most of the time, nothing happens. When tasks demand perfect instruction adherence—formatting, style, length, logical constraints—the model either nails everything or gets a zero. Binary rewards feel principled, but in practice they starve learning. Aggregated rewards try to help, but they blur causality: different mistakes, same score, same gradient. The result is slow, noisy, and often misdirected optimization. ...

December 30, 2025 · 4 min · Zelina
Cover image

The Web, Reimagined as a World Model

Opening — Why this matters now Language agents are no longer satisfied with short conversations and disposable prompts. They want places—environments where actions have consequences, memory persists, and the world does not politely forget everything after the next API call. Unfortunately, today’s tooling offers an awkward choice: either rigid web applications backed by databases, or fully generative world models that hallucinate their own physics and promptly lose the plot. ...

December 30, 2025 · 4 min · Zelina