Cover image

Secrets, Context, and the RAG Illusion

Opening — Why this matters now Personalized AI assistants are rapidly becoming ambient infrastructure. They draft emails, recall old conversations, summarize private chats, and quietly stitch together our digital lives. The selling point is convenience. The hidden cost is context collapse. The paper behind this article introduces PrivacyBench, a benchmark designed to answer an uncomfortable but overdue question: when AI assistants know everything about us, can they be trusted to know when to stay silent? The short answer is no—not reliably, and not by accident. ...

January 2, 2026 · 4 min · Zelina
Cover image

Deployed, Retrained, Repeated: When LLMs Learn From Being Used

Opening — Why this matters now The AI industry likes to pretend that training happens in neat, well-funded labs and deployment is merely the victory lap. Reality, as usual, is less tidy. Large language models are increasingly learning after release—absorbing their own successful outputs through user curation, web sharing, and subsequent fine‑tuning. This paper puts a sharp analytical frame around that uncomfortable truth: deployment itself is becoming a training regime. ...

January 1, 2026 · 4 min · Zelina
Cover image

Gen Z, But Make It Statistical: Teaching LLMs to Listen to Data

Opening — Why this matters now Foundation models are fluent. They are not observant. In 2024–2025, enterprises learned the hard way that asking an LLM to explain a dataset is very different from asking it to fit one. Large language models know a lot about the world, but they are notoriously bad at learning dataset‑specific structure—especially when the signal lives in proprietary data, niche markets, or dated user behavior. This gap is where GenZ enters, with none of the hype and most of the discipline. ...

January 1, 2026 · 4 min · Zelina
Cover image

Label Now, Drive Later: Why Autonomous Driving Needs Fewer Clicks, Not Smarter Annotators

Opening — Why this matters now Autonomous driving research does not stall because of missing models. It stalls because of missing labels. Every promising perception architecture eventually collides with the same bottleneck: the slow, expensive, and error-prone process of annotating multimodal driving data. LiDAR point clouds do not label themselves. Cameras do not politely blur faces for GDPR compliance. And human annotators, despite heroic patience, remain both costly and inconsistent at scale. ...

January 1, 2026 · 4 min · Zelina
Cover image

Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

Opening — Why this matters now Care facilities are drowning in spreadsheets, tacit knowledge, and institutional memory. Shift schedules are still handcrafted—painfully—by managers who know the rules not because they are written down, but because they have been violated before. Automation promises relief, yet adoption remains stubbornly low. The reason is not optimization power. It is translation failure. ...

January 1, 2026 · 4 min · Zelina
Cover image

Let It Flow: ROME and the Economics of Agentic Craft

Opening — Why this matters now 2025 quietly settled an uncomfortable truth in AI: agents are not products, they are supply chains. Anyone can demo a tool-using model. Very few can make it survive contact with real environments, long-horizon tasks, and users who refuse to behave like benchmarks. The paper “Let It Flow: Agentic Crafting on Rock and Roll” arrives at exactly this inflection point. Instead of promising yet another agent, it asks a more grown-up question: what kind of ecosystem is required to reliably produce agents at scale? ...

January 1, 2026 · 3 min · Zelina
Cover image

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

Opening — Why this matters now AI can already write poetry, debug code, and argue philosophy. Yet ask most large language models to plan a realistic trip—respecting time, geography, traffic, weather, and human constraints—and they quietly fall apart. Real-world planning is messy, asynchronous, and unforgiving. Unlike math problems, you cannot hallucinate a charging station that does not exist. ...

January 1, 2026 · 3 min · Zelina
Cover image

When Your House Talks Back: Teaching Buildings to Think About Energy

Opening — Why this matters now Buildings quietly consume around a third of the world’s energy. Most of that consumption is governed not by grand strategy, but by human habit: when people cook, charge vehicles, cool rooms, or forget to turn things off. For decades, Building Energy Management Systems (BEMS) promised optimization. In practice, they delivered dashboards—dense, technical, and mostly ignored. ...

January 1, 2026 · 4 min · Zelina
Cover image

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

Opening — Why this matters now Large Language Models have learned to think. Then we asked them to act. Now we want them to browse — and suddenly everything breaks. Deep research agents are running head‑first into a practical wall: the modern web is not made of tidy pages and polite APIs. It is dynamic, stateful, bloated, and aggressively redundant. Give an agent a real browser and it drowns in tokens. Don’t give it one, and it misses the most valuable information entirely. ...

December 31, 2025 · 4 min · Zelina
Cover image

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

Opening — Why this matters now For all the breathless demos, AI coding agents still collapse embarrassingly often when faced with real software engineering: large repositories, ambiguous issues, long horizons, and no hand-holding. Benchmarks like SWE-bench-Live have made this painfully explicit. Models that look heroic on curated tasks suddenly forget how to navigate a codebase without spiraling into context soup. ...

December 31, 2025 · 4 min · Zelina