Cover image

Training Models to Explain Themselves: Counterfactuals as a First-Class Objective

Opening — Why this matters now As AI systems increasingly decide who gets a loan, a job interview, or access to public services, explanations have stopped being a philosophical luxury. They are now a regulatory, ethical, and operational requirement. Counterfactual explanations—“If your income were $5,000 higher, the loan would have been approved”—have emerged as one of the most intuitive tools for algorithmic recourse. ...

January 24, 2026 · 4 min · Zelina
Cover image

Triage by Token: When Context Clues Quietly Override Clinical Judgment

Opening — Why this matters now Large language models are quietly moving from clerical assistance to clinical suggestion. In emergency departments (EDs), where seconds matter and triage decisions shape outcomes, LLM-based decision support tools are increasingly tempting: fast, consistent, and seemingly neutral. Yet neutrality in language does not guarantee neutrality in judgment. This paper interrogates a subtle but consequential failure mode: latent bias introduced through proxy variables. Not overt racism. Not explicit socioeconomic labeling. Instead, ordinary contextual cues—how a patient arrives, where they live, how often they visit the ED—nudging model outputs in clinically unjustified ways. ...

January 24, 2026 · 4 min · Zelina
Cover image

When LLMs Get a Laptop: Why Sandboxes Might Be the Real AGI Benchmark

Opening — Why this matters now LLMs have learned to speak fluently. They can reason passably. Some can even plan. Yet most of them remain trapped in an oddly artificial condition: they think, but they cannot act. The latest wave of agent frameworks tries to fix this with tools, APIs, and carefully curated workflows. But a quieter idea is emerging underneath the hype—one that looks less like prompt engineering and more like infrastructure. ...

January 24, 2026 · 4 min · Zelina
Cover image

When Models Guess the Verb by Looking at the Drawer

Opening — Why this matters now If you have ever watched a video model confidently predict opening drawer when the person is clearly closing it, you have already encountered the core problem of modern compositional video understanding: the model isn’t really watching the action. It is guessing. As video models are increasingly deployed in robotics, industrial monitoring, and human–AI interaction, the ability to correctly generalize unseen verb–object combinations is no longer academic. A robot that confuses opening with closing is not merely inaccurate—it is dangerous. ...

January 24, 2026 · 4 min · Zelina
Cover image

Affective Inertia: Teaching LLM Agents to Remember Who They Are

Opening — Why this matters now LLM agents are getting longer memories, better tools, and more elaborate planning stacks—yet they still suffer from a strangely human flaw: emotional whiplash. An agent that sounds empathetic at turn 5 can become oddly cold at turn 7, then conciliatory again by turn 9. For applications that rely on trust, continuity, or persuasion—mental health tools, tutors, social robots—this instability is not a cosmetic issue. It’s a structural one. ...

January 23, 2026 · 3 min · Zelina
Cover image

Cosmos Policy: When Video Models Stop Watching and Start Acting

Opening — Why this matters now Robotics has quietly entered an awkward phase. Models can see remarkably well and talk impressively about tasks—but when it comes to executing long-horizon, high-precision actions in the physical world, performance still collapses in the details. Grasp slips. Motions jitter. Multimodal uncertainty wins. At the same time, video generation models have undergone a renaissance. Large diffusion-based video models now encode temporal causality, implicit physics, and motion continuity at a scale robotics has never had access to. The obvious question follows: ...

January 23, 2026 · 4 min · Zelina
Cover image

Learning the Fast Lane: When MILP Solvers Start Remembering Where the Answer Is

Opening — Why this matters now Mixed-Integer Linear Programming (MILP) sits quietly underneath a surprising amount of modern infrastructure: logistics routing, auctions, facility placement, chip layout, resource allocation. When it works, no one notices. When it doesn’t, the solver spins for hours, racks up nodes, and quietly burns money. At the center of this tension is branch-and-bound—an exact algorithm that is elegant in theory and painfully sensitive in practice. Its speed hinges less on raw compute than on where it looks first. For decades, that decision has been guided by human-designed heuristics: clever, brittle, and wildly inconsistent across problem families. ...

January 23, 2026 · 4 min · Zelina
Cover image

Prompt Wars: When Pedagogy Beats Cleverness

Opening — Why this matters now Educational AI has entered its prompt era. Models are powerful, APIs are cheap, and everyone—from edtech startups to university labs—is tweaking prompts like seasoning soup. The problem? Most of this tweaking is still artisanal. Intuition-heavy. Barely documented. And almost never evaluated with the same rigor we expect from the learning science it claims to support. ...

January 23, 2026 · 3 min · Zelina
Cover image

Seeing Is Misleading: When Climate Images Need Receipts

Opening — Why this matters now Climate misinformation has matured. It no longer argues; it shows. A melting glacier with the wrong caption. A wildfire image from another decade. A meme that looks scientific enough to feel authoritative. In an era where images travel faster than footnotes, public understanding of climate science is increasingly shaped by visuals that lie by omission, context shift, or outright fabrication. ...

January 23, 2026 · 3 min · Zelina
Cover image

Skeletons in the Proof Closet: When Lean Provers Need Hints, Not More Compute

Opening — Why this matters now Neural theorem proving has entered its industrial phase. With reinforcement learning pipelines, synthetic data factories, and search budgets that would make a chess engine blush, models like DeepSeek‑Prover‑V1.5 are widely assumed to have internalized everything there is to know about formal proof structure. This paper politely disagrees. Under tight inference budgets—no massive tree search, no thousand-sample hail‑Mary—the author shows that simple, almost embarrassingly old‑fashioned structural hints still deliver large gains. Not new models. Not more data. Just better scaffolding. ...

January 23, 2026 · 4 min · Zelina