Cover image

Rationales Before Results: Teaching Multimodal LLMs to Actually Reason About Time Series

Opening — Why this matters now Multimodal LLMs are increasingly being asked to reason about time series: markets, traffic, power grids, pollution. Charts are rendered. Prompts are polished. The answers sound confident. And yet—too often—they’re wrong for the most boring reason imaginable: the model never actually reasons. Instead, it pattern-matches. This paper dissects that failure mode with unusual clarity. The authors argue that the bottleneck is not model scale, data access, or even modality alignment. It’s the absence of explicit reasoning priors that connect observed temporal patterns to downstream outcomes. Without those priors, multimodal LLMs hallucinate explanations after the fact, mistaking surface similarity for causality. ...

January 7, 2026 · 4 min · Zelina
Cover image

When One Clip Isn’t Enough: Teaching LLMs to Watch Long Videos Like Adults

Opening — Why this matters now Large language models have learned to see. Unfortunately, they still have the attention span of a distracted intern when the video runs longer than a minute. As multimodal LLMs expand their context windows and promise “end-to-end” video understanding, a hard reality remains: long videos are not just longer inputs—they are fundamentally different reasoning problems. Information is sparse, temporally distant, multimodal, and often only meaningful when grounded precisely in time and space. Compress everything up front, and you lose the evidence. Don’t compress, and you blow the context budget. ...

December 24, 2025 · 4 min · Zelina