Cover image

PRISM and the Art of Not Losing Meaning

Opening — Why this matters now Generative Sequential Recommendation (GSR) is having its moment. By reframing recommendation as an autoregressive generation problem over Semantic IDs (SIDs), the field promises something long overdue: a unified retrieval-and-ranking pipeline that actually understands what items mean, not just where they sit in an embedding table. But beneath the hype sits an uncomfortable truth. Most lightweight GSR systems are quietly sabotaging themselves. They collapse their own codebooks, blur semantic boundaries, and then wonder why performance tanks—especially on sparse, long‑tail data. PRISM arrives as a sober correction to that pattern. ...

January 26, 2026 · 4 min · Zelina
Cover image

When Alignment Is Not Enough: Reading Between the Lines of Modern LLM Safety

Opening — Why this matters now In the past two years, alignment has quietly shifted from an academic concern to a commercial liability. The paper you uploaded (arXiv:2601.16589) sits squarely in this transition period: post-RLHF optimism, pre-regulatory realism. It asks a deceptively simple question—do current alignment techniques actually constrain model behavior in the ways we think they do?—and then proceeds to make that question uncomfortable. ...

January 26, 2026 · 3 min · Zelina
Cover image

When Models Listen but Stop Thinking: Teaching Audio Models to Reason Like They Read

Opening — Why this matters now Audio-first interfaces are everywhere. Voice assistants, call-center bots, in-car copilots, and accessibility tools all rely on large audio-language models (LALMs) that promise to hear and think at the same time. Yet in practice, something awkward happens: the same model that reasons fluently when reading text suddenly becomes hesitant, shallow, or just wrong when listening to speech. ...

January 26, 2026 · 4 min · Zelina
Cover image

When SGD Remembers: The Hidden Memory Inside Training Dynamics

Opening — Why this matters now Modern deep learning quietly assumes a comforting fiction: that training is memoryless. Given the current parameters (and maybe the optimizer buffers), tomorrow’s update shouldn’t care about yesterday’s data order, augmentation choice, or micro-step path. This assumption underwrites theory, stabilizes intuition, and keeps whiteboards clean. Reality, however, has been less cooperative. Practitioners know that order matters, momentum carries ghosts of past gradients, and small curriculum tweaks can echo far longer than expected. Yet until now, there has been no clean, operational way to measure whether training truly forgets—or merely pretends to. ...

January 26, 2026 · 4 min · Zelina
Cover image

When Trains Meet Snowstorms: Turning Weather Chaos into Predictable Rail Operations

Opening — Why this matters now Railway delays are one of those problems everyone experiences and almost no one truly understands. Passengers blame weather. Operators blame operations. Data scientists blame missing variables. Everyone is partially correct. What has quietly shifted in recent years is not the weather itself, but our ability to observe it alongside operations—continuously, spatially, and at scale. As rail systems push toward AI‑assisted scheduling, predictive maintenance, and real‑time disruption management, delay prediction without weather is no longer just incomplete—it is structurally misleading. ...

January 26, 2026 · 4 min · Zelina
Cover image

Gated Sparse Attention: Speed Without the Sink

Opening — Why this matters now Long-context language models have crossed an uncomfortable threshold. Context windows now stretch to 128K tokens and beyond, yet the core attention mechanism still scales quadratically. The result is a growing mismatch between what models can theoretically ingest and what is economically and operationally feasible. At the same time, training instability — loss spikes, attention sinks, brittle gradients — continues to haunt large-scale runs. ...

January 24, 2026 · 4 min · Zelina
Cover image

Learning to Discover at Test Time: When Search Learns Back

Opening — Why this matters now For years, scaling AI meant one thing: train bigger models, then freeze them. At inference time, we search harder, sample wider, and hope brute force compensates for epistemic limits. This paper challenges that orthodoxy. It argues—quietly but decisively—that search alone is no longer enough. If discovery problems are truly out-of-distribution, then the model must be allowed to learn at test time. ...

January 24, 2026 · 3 min · Zelina
Cover image

PyraTok: When Video Tokens Finally Learn to Speak Human

Opening — Why this matters now Text-to-video models are scaling at an alarming pace. Resolution is no longer the bottleneck—semantic fidelity is. As generators push into 4K and even 8K regimes, a quieter but more consequential problem emerges underneath: the tokenizer. If visual tokens do not align with language, no amount of diffusion steps will save downstream reasoning, control, or zero-shot transfer. ...

January 24, 2026 · 3 min · Zelina
Cover image

Training Models to Explain Themselves: Counterfactuals as a First-Class Objective

Opening — Why this matters now As AI systems increasingly decide who gets a loan, a job interview, or access to public services, explanations have stopped being a philosophical luxury. They are now a regulatory, ethical, and operational requirement. Counterfactual explanations—“If your income were $5,000 higher, the loan would have been approved”—have emerged as one of the most intuitive tools for algorithmic recourse. ...

January 24, 2026 · 4 min · Zelina
Cover image

Triage by Token: When Context Clues Quietly Override Clinical Judgment

Opening — Why this matters now Large language models are quietly moving from clerical assistance to clinical suggestion. In emergency departments (EDs), where seconds matter and triage decisions shape outcomes, LLM-based decision support tools are increasingly tempting: fast, consistent, and seemingly neutral. Yet neutrality in language does not guarantee neutrality in judgment. This paper interrogates a subtle but consequential failure mode: latent bias introduced through proxy variables. Not overt racism. Not explicit socioeconomic labeling. Instead, ordinary contextual cues—how a patient arrives, where they live, how often they visit the ED—nudging model outputs in clinically unjustified ways. ...

January 24, 2026 · 4 min · Zelina