Cover image

Gen Z, But Make It Statistical: Teaching LLMs to Listen to Data

Opening — Why this matters now Foundation models are fluent. They are not observant. In 2024–2025, enterprises learned the hard way that asking an LLM to explain a dataset is very different from asking it to fit one. Large language models know a lot about the world, but they are notoriously bad at learning dataset‑specific structure—especially when the signal lives in proprietary data, niche markets, or dated user behavior. This gap is where GenZ enters, with none of the hype and most of the discipline. ...

January 1, 2026 · 4 min · Zelina
Cover image

Label Now, Drive Later: Why Autonomous Driving Needs Fewer Clicks, Not Smarter Annotators

Opening — Why this matters now Autonomous driving research does not stall because of missing models. It stalls because of missing labels. Every promising perception architecture eventually collides with the same bottleneck: the slow, expensive, and error-prone process of annotating multimodal driving data. LiDAR point clouds do not label themselves. Cameras do not politely blur faces for GDPR compliance. And human annotators, despite heroic patience, remain both costly and inconsistent at scale. ...

January 1, 2026 · 4 min · Zelina
Cover image

Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

Opening — Why this matters now Care facilities are drowning in spreadsheets, tacit knowledge, and institutional memory. Shift schedules are still handcrafted—painfully—by managers who know the rules not because they are written down, but because they have been violated before. Automation promises relief, yet adoption remains stubbornly low. The reason is not optimization power. It is translation failure. ...

January 1, 2026 · 4 min · Zelina
Cover image

Let It Flow: ROME and the Economics of Agentic Craft

Opening — Why this matters now 2025 quietly settled an uncomfortable truth in AI: agents are not products, they are supply chains. Anyone can demo a tool-using model. Very few can make it survive contact with real environments, long-horizon tasks, and users who refuse to behave like benchmarks. The paper “Let It Flow: Agentic Crafting on Rock and Roll” arrives at exactly this inflection point. Instead of promising yet another agent, it asks a more grown-up question: what kind of ecosystem is required to reliably produce agents at scale? ...

January 1, 2026 · 3 min · Zelina
Cover image

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

Opening — Why this matters now AI can already write poetry, debug code, and argue philosophy. Yet ask most large language models to plan a realistic trip—respecting time, geography, traffic, weather, and human constraints—and they quietly fall apart. Real-world planning is messy, asynchronous, and unforgiving. Unlike math problems, you cannot hallucinate a charging station that does not exist. ...

January 1, 2026 · 3 min · Zelina
Cover image

When Your House Talks Back: Teaching Buildings to Think About Energy

Opening — Why this matters now Buildings quietly consume around a third of the world’s energy. Most of that consumption is governed not by grand strategy, but by human habit: when people cook, charge vehicles, cool rooms, or forget to turn things off. For decades, Building Energy Management Systems (BEMS) promised optimization. In practice, they delivered dashboards—dense, technical, and mostly ignored. ...

January 1, 2026 · 4 min · Zelina
Cover image

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

Opening — Why this matters now Large Language Models have learned to think. Then we asked them to act. Now we want them to browse — and suddenly everything breaks. Deep research agents are running head‑first into a practical wall: the modern web is not made of tidy pages and polite APIs. It is dynamic, stateful, bloated, and aggressively redundant. Give an agent a real browser and it drowns in tokens. Don’t give it one, and it misses the most valuable information entirely. ...

December 31, 2025 · 4 min · Zelina
Cover image

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

Opening — Why this matters now For all the breathless demos, AI coding agents still collapse embarrassingly often when faced with real software engineering: large repositories, ambiguous issues, long horizons, and no hand-holding. Benchmarks like SWE-bench-Live have made this painfully explicit. Models that look heroic on curated tasks suddenly forget how to navigate a codebase without spiraling into context soup. ...

December 31, 2025 · 4 min · Zelina
Cover image

RxnBench: Reading Chemistry Like a Human (Turns Out That’s Hard)

Opening — Why this matters now Multimodal Large Language Models (MLLMs) have become impressively fluent readers of the world. They can caption images, parse charts, and answer questions about documents that would once have required a human analyst and a strong coffee. Naturally, chemistry was next. But chemistry does not speak in sentences. It speaks in arrows, wedges, dashed bonds, cryptic tables, and reaction schemes buried three pages away from their explanations. If we want autonomous “AI chemists,” the real test is not trivia or SMILES strings — it is whether models can read actual chemical papers. ...

December 31, 2025 · 4 min · Zelina
Cover image

The Invariance Trap: Why Matching Distributions Can Break Your Model

Opening — Why this matters now Distribution shift is no longer a corner case; it is the default condition of deployed AI. Models trained on pristine datasets routinely face degraded sensors, partial observability, noisy pipelines, or institutional drift once they leave the lab. The industry response has been almost reflexive: enforce invariance. Align source and target representations, minimize divergence, and hope the problem disappears. ...

December 31, 2025 · 4 min · Zelina