Cover image

Rationales Before Results: Teaching Multimodal LLMs to Actually Reason About Time Series

Opening — Why this matters now Multimodal LLMs are increasingly being asked to reason about time series: markets, traffic, power grids, pollution. Charts are rendered. Prompts are polished. The answers sound confident. And yet—too often—they’re wrong for the most boring reason imaginable: the model never actually reasons. Instead, it pattern-matches. This paper dissects that failure mode with unusual clarity. The authors argue that the bottleneck is not model scale, data access, or even modality alignment. It’s the absence of explicit reasoning priors that connect observed temporal patterns to downstream outcomes. Without those priors, multimodal LLMs hallucinate explanations after the fact, mistaking surface similarity for causality. ...

January 7, 2026 · 4 min · Zelina
Cover image

Privacy by Proximity: How Nearest Neighbors Made In-Context Learning Differentially Private

Opening — Why this matters now As large language models (LLMs) weave themselves into every enterprise workflow, a quieter issue looms: the privacy of the data used to prompt them. In‑context learning (ICL) — the art of teaching a model through examples in its prompt — is fast, flexible, and dangerously leaky. Each query could expose confidential examples from private datasets. Enter differential privacy (DP), the mathematical armor for sensitive data — except until now, DP methods for ICL have been clumsy and utility‑poor. ...

November 8, 2025 · 4 min · Zelina
Cover image

Razor Burn: Why LLMs Nick Themselves on Induction and Abduction

TL;DR A new synthetic benchmark (INABHYD) tests inductive and abductive reasoning under Occam’s Razor. LLMs handle toy cases but falter as ontologies deepen or when multiple hypotheses are needed. Even when models “explain” observations, they often pick needlessly complex or trivial hypotheses—precisely the opposite of what scientific discovery and root-cause analysis require. The Big Idea Most reasoning work on LLMs obsesses over deduction (step-by-step proofs). But the real world demands induction (generalize rules) and abduction (best explanation). The paper introduces INABHYD, a programmable benchmark that builds fictional ontology trees (concepts, properties, subtype links) and hides some axioms. The model sees an incomplete world + observations, and must propose hypotheses that both explain all observations and do so parsimoniously (Occam’s Razor). The authors score: ...

September 6, 2025 · 4 min · Zelina