Cover image

LeanCat-astrophe: Why Category Theory Is Where LLM Provers Go to Struggle

Opening — Why this matters now Formal theorem proving has entered its confident phase. We now have models that can clear olympiad-style problems, undergraduate algebra, and even parts of the Putnam with respectable success rates. Reinforcement learning, tool feedback, and test-time scaling have done their job. And then LeanCat arrives — and the success rates collapse. ...

January 2, 2026 · 4 min · Zelina
Cover image

MI-ZO: Teaching Vision-Language Models Where to Look

Opening — Why this matters now Vision-Language Models (VLMs) are everywhere—judging images, narrating videos, and increasingly acting as reasoning engines layered atop perception. But there is a quiet embarrassment in the room: most state-of-the-art VLMs are trained almost entirely on 2D data, then expected to reason about 3D worlds as if depth, occlusion, and viewpoint were minor details. ...

January 2, 2026 · 4 min · Zelina
Cover image

Planning Before Picking: When Slate Recommendation Learns to Think

Opening — Why this matters now Recommendation systems have quietly crossed a threshold. The question is no longer what to recommend, but how many things, in what order, and with what balance. In feeds, short-video apps, and content platforms, users consume slates—lists experienced holistically. Yet most systems still behave as if each item lives alone, blissfully unaware of its neighbors. ...

January 2, 2026 · 3 min · Zelina
Cover image

Deployed, Retrained, Repeated: When LLMs Learn From Being Used

Opening — Why this matters now The AI industry likes to pretend that training happens in neat, well-funded labs and deployment is merely the victory lap. Reality, as usual, is less tidy. Large language models are increasingly learning after release—absorbing their own successful outputs through user curation, web sharing, and subsequent fine‑tuning. This paper puts a sharp analytical frame around that uncomfortable truth: deployment itself is becoming a training regime. ...

January 1, 2026 · 4 min · Zelina
Cover image

Gen Z, But Make It Statistical: Teaching LLMs to Listen to Data

Opening — Why this matters now Foundation models are fluent. They are not observant. In 2024–2025, enterprises learned the hard way that asking an LLM to explain a dataset is very different from asking it to fit one. Large language models know a lot about the world, but they are notoriously bad at learning dataset‑specific structure—especially when the signal lives in proprietary data, niche markets, or dated user behavior. This gap is where GenZ enters, with none of the hype and most of the discipline. ...

January 1, 2026 · 4 min · Zelina
Cover image

Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

Opening — Why this matters now Care facilities are drowning in spreadsheets, tacit knowledge, and institutional memory. Shift schedules are still handcrafted—painfully—by managers who know the rules not because they are written down, but because they have been violated before. Automation promises relief, yet adoption remains stubbornly low. The reason is not optimization power. It is translation failure. ...

January 1, 2026 · 4 min · Zelina
Cover image

Let It Flow: ROME and the Economics of Agentic Craft

Opening — Why this matters now 2025 quietly settled an uncomfortable truth in AI: agents are not products, they are supply chains. Anyone can demo a tool-using model. Very few can make it survive contact with real environments, long-horizon tasks, and users who refuse to behave like benchmarks. The paper “Let It Flow: Agentic Crafting on Rock and Roll” arrives at exactly this inflection point. Instead of promising yet another agent, it asks a more grown-up question: what kind of ecosystem is required to reliably produce agents at scale? ...

January 1, 2026 · 3 min · Zelina
Cover image

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

Opening — Why this matters now AI can already write poetry, debug code, and argue philosophy. Yet ask most large language models to plan a realistic trip—respecting time, geography, traffic, weather, and human constraints—and they quietly fall apart. Real-world planning is messy, asynchronous, and unforgiving. Unlike math problems, you cannot hallucinate a charging station that does not exist. ...

January 1, 2026 · 3 min · Zelina
Cover image

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

Opening — Why this matters now Large Language Models have learned to think. Then we asked them to act. Now we want them to browse — and suddenly everything breaks. Deep research agents are running head‑first into a practical wall: the modern web is not made of tidy pages and polite APIs. It is dynamic, stateful, bloated, and aggressively redundant. Give an agent a real browser and it drowns in tokens. Don’t give it one, and it misses the most valuable information entirely. ...

December 31, 2025 · 4 min · Zelina
Cover image

The Invariance Trap: Why Matching Distributions Can Break Your Model

Opening — Why this matters now Distribution shift is no longer a corner case; it is the default condition of deployed AI. Models trained on pristine datasets routinely face degraded sensors, partial observability, noisy pipelines, or institutional drift once they leave the lab. The industry response has been almost reflexive: enforce invariance. Align source and target representations, minimize divergence, and hope the problem disappears. ...

December 31, 2025 · 4 min · Zelina