Cover image

Long Thoughts, Short Bills: Distilling Mathematical Reasoning at Scale

Opening — Why this matters now Large language models can solve math problems. The more interesting question in 2025 is whether they can learn how to reason, at scale, across contexts that are long, messy, and computationally expensive. Most math datasets answer the first question. Nemotron-Math answers the second — and does so with a surprisingly pragmatic eye on cost. ...

December 18, 2025 · 4 min · Zelina
Cover image

Stepwise Think-Critique: Teaching LLMs to Doubt Themselves (Productively)

Opening — Why this matters now Large Language Models have learned how to think out loud. What they still struggle with is knowing when that thinking is wrong — while it is happening. In high‑stakes domains like mathematics, finance, or policy automation, delayed error detection is not a feature; it is a liability. Most modern reasoning pipelines still follow an awkward split: first generate reasoning, then verify it — often with a separate model. Humans do not work this way. We reason and judge simultaneously. This paper asks a simple but uncomfortable question: what if LLMs were trained to do the same? ...

December 18, 2025 · 4 min · Zelina
Cover image

Model First, Think Later: Why LLMs Fail Before They Reason

Opening — Why this matters now As LLM agents graduate from clever chatbots to decision‑making systems, their failures are becoming less amusing and more expensive. We are no longer talking about wrong trivia answers; we are talking about broken schedules, invalid plans, unsafe workflows, and agents confidently violating constraints they were never told—explicitly—not to break. ...

December 17, 2025 · 4 min · Zelina
Cover image

Ports, But Make Them Agentic: When LLMs Start Running the Yard

Opening — Why this matters now Ports are supposed to be automated. In practice, many of their most critical decisions still depend on a small priesthood of optimization specialists, tribal operational knowledge, and painfully slow deployment cycles. Vehicle Dispatching Systems (VDSs) — the logic that tells fleets of AGVs where to go and when — are a prime example. They promise up to 30% efficiency gains, yet stubbornly resist scaling from one terminal to another. ...

December 17, 2025 · 4 min · Zelina
Cover image

Reasoning Loops, Not Bigger Brains

Opening — Why this matters now For the past two years, AI progress has been narrated as a story of scale: more parameters, more data, more compute. Yet the ARC-AGI leaderboard keeps delivering an inconvenient counterexample. Small, scratch-trained models—no web-scale pretraining, no trillion-token diet—are routinely humiliating far larger systems on abstract reasoning tasks. This paper asks the uncomfortable question: where is the reasoning actually coming from? ...

December 17, 2025 · 3 min · Zelina
Cover image

When Attention Learns to Breathe: Sparse Transformers for Sustainable Medical AI

Opening — Why this matters now Healthcare AI has quietly run into a contradiction. We want models that are richer—multi-modal, context-aware, clinically nuanced—yet we increasingly deploy them in environments that are poorer: fewer samples, missing modalities, limited compute, and growing scrutiny over energy use. Transformers, the industry’s favorite hammer, are powerful but notoriously wasteful. In medicine, that waste is no longer academic; it is operational. ...

December 17, 2025 · 4 min · Zelina
Cover image

NeuralFOMO: When LLMs Care About Being Second

Opening — Why this matters now LLMs no longer live alone. They rank against each other on leaderboards, bid for tasks inside agent frameworks, negotiate in shared environments, and increasingly compete—sometimes quietly, sometimes explicitly. Once models are placed side-by-side, performance stops being purely absolute. Relative standing suddenly matters. This paper asks an uncomfortable question: do LLMs care about losing—even when losing costs them nothing tangible? ...

December 16, 2025 · 4 min · Zelina
Cover image

When Medical AI Stops Guessing and Starts Asking

Opening — Why this matters now Medical AI has become very good at answering questions. Unfortunately, medicine rarely works that way. Pathology, oncology, and clinical decision-making are not single-query problems. They are investigative processes: observe, hypothesize, cross-check, revise, and only then conclude. Yet most medical AI benchmarks still reward models for producing one-shot answers — neat, confident, and often misleading. This mismatch is no longer academic. As multimodal models edge closer to clinical workflows, the cost of shallow reasoning becomes operational, regulatory, and ethical. ...

December 16, 2025 · 4 min · Zelina
Cover image

When Precedent Gets Nuanced: Why Legal AI Needs Dimensions, Not Just Factors

Opening — Why this matters now Legal AI has a habit of oversimplifying judgment. In the race to automate legal reasoning, we have learned how to encode rules, then factors, and eventually hierarchies of factors. But something stubborn keeps leaking through the abstractions: strength. Not whether a reason exists — but how strongly it exists. ...

December 16, 2025 · 4 min · Zelina
Cover image

When Reasoning Needs Receipts: Graphs Over Guesswork in Medical AI

Opening — Why this matters now Medical AI has a credibility problem. Not because large language models (LLMs) can’t answer medical questions—they increasingly can—but because they often arrive at correct answers for the wrong reasons. In medicine, that distinction is not academic. A shortcut that accidentally lands on the right diagnosis today can quietly institutionalize dangerous habits tomorrow. ...

December 16, 2025 · 3 min · Zelina