Cover image

Metric Time Without the Clock: Making ASP Scale Again

Opening — Why this matters now Temporal reasoning has always been the Achilles’ heel of symbolic AI. The moment time becomes quantitative—minutes, deadlines, durations—logic programs tend to balloon, grounders panic, and scalability quietly exits the room. This paper lands squarely in that discomfort zone and does something refreshingly unglamorous: it makes time boring again. And boring, in this case, is good for business. ...

January 31, 2026 · 3 min · Zelina
Cover image

When LLMs Invent Languages: Efficiency, Secrecy, and the Limits of Natural Speech

Opening — Why this matters now Large language models are supposed to speak our language. Yet as they become more capable, something uncomfortable emerges: when pushed to cooperate efficiently, models often abandon natural language altogether. This paper shows that modern vision–language models (VLMs) can spontaneously invent task-specific communication protocols—compressed, opaque, and sometimes deliberately unreadable to outsiders—without any fine-tuning. Just prompts. ...

January 31, 2026 · 3 min · Zelina
Cover image

CAR-bench: When Agents Don’t Know What They Don’t Know

Opening — Why this matters now LLM agents are no longer toys. They book flights, write emails, control vehicles, and increasingly operate in environments where getting it mostly right is not good enough. In real-world deployments, the failure mode that matters most is not ignorance—it is false confidence. Agents act when they should hesitate, fabricate when they should refuse, and choose when they should ask. ...

January 30, 2026 · 4 min · Zelina
Cover image

Safety by Design, Rewritten: When Data Defines the Boundary

Opening — Why this matters now Safety-critical AI has a credibility problem. Not because it fails spectacularly—though that happens—but because we often cannot say where it is allowed to succeed. Regulators demand clear operational boundaries. Engineers deliver increasingly capable models. Somewhere in between, the Operational Design Domain (ODD) is supposed to translate reality into something certifiable. ...

January 30, 2026 · 5 min · Zelina
Cover image

The Patient Is Not a Moving Document: Why Clinical AI Needs World Models

Opening — Why this matters now Clinical AI has quietly hit a ceiling. Over the past five years, large language models trained on electronic health records (EHRs) have delivered impressive gains: better coding, stronger risk prediction, and even near‑physician exam performance. But beneath those wins lies an uncomfortable truth. Most clinical foundation models still treat patients as documents—static records to be summarized—rather than systems evolving over time. ...

January 30, 2026 · 4 min · Zelina
Cover image

When Rewards Learn to Think: Teaching Agents *How* They’re Wrong

Opening — Why this matters now Agentic AI is having a credibility problem. Not because agents can’t browse, code, or call tools—but because we still train them like they’re taking a final exam with no partial credit. Most agentic reinforcement learning (RL) systems reward outcomes, not process. Either the agent finishes the task correctly, or it doesn’t. For short problems, that’s tolerable. For long-horizon, tool-heavy reasoning tasks, it’s catastrophic. A single late-stage mistake erases an otherwise competent trajectory. ...

January 30, 2026 · 4 min · Zelina
Cover image

When Models Listen but Stop Thinking: Teaching Audio Models to Reason Like They Read

Opening — Why this matters now Audio-first interfaces are everywhere. Voice assistants, call-center bots, in-car copilots, and accessibility tools all rely on large audio-language models (LALMs) that promise to hear and think at the same time. Yet in practice, something awkward happens: the same model that reasons fluently when reading text suddenly becomes hesitant, shallow, or just wrong when listening to speech. ...

January 26, 2026 · 4 min · Zelina
Cover image

When SGD Remembers: The Hidden Memory Inside Training Dynamics

Opening — Why this matters now Modern deep learning quietly assumes a comforting fiction: that training is memoryless. Given the current parameters (and maybe the optimizer buffers), tomorrow’s update shouldn’t care about yesterday’s data order, augmentation choice, or micro-step path. This assumption underwrites theory, stabilizes intuition, and keeps whiteboards clean. Reality, however, has been less cooperative. Practitioners know that order matters, momentum carries ghosts of past gradients, and small curriculum tweaks can echo far longer than expected. Yet until now, there has been no clean, operational way to measure whether training truly forgets—or merely pretends to. ...

January 26, 2026 · 4 min · Zelina
Cover image

When Trains Meet Snowstorms: Turning Weather Chaos into Predictable Rail Operations

Opening — Why this matters now Railway delays are one of those problems everyone experiences and almost no one truly understands. Passengers blame weather. Operators blame operations. Data scientists blame missing variables. Everyone is partially correct. What has quietly shifted in recent years is not the weather itself, but our ability to observe it alongside operations—continuously, spatially, and at scale. As rail systems push toward AI‑assisted scheduling, predictive maintenance, and real‑time disruption management, delay prediction without weather is no longer just incomplete—it is structurally misleading. ...

January 26, 2026 · 4 min · Zelina
Cover image

Training Models to Explain Themselves: Counterfactuals as a First-Class Objective

Opening — Why this matters now As AI systems increasingly decide who gets a loan, a job interview, or access to public services, explanations have stopped being a philosophical luxury. They are now a regulatory, ethical, and operational requirement. Counterfactual explanations—“If your income were $5,000 higher, the loan would have been approved”—have emerged as one of the most intuitive tools for algorithmic recourse. ...

January 24, 2026 · 4 min · Zelina