Cover image

Recommendations With Receipts: When LLMs Have to Prove They Behaved

Opening — Why this matters now LLMs are increasingly trusted to recommend what we watch, buy, or read. But trust breaks down the moment a regulator, auditor, or policy team asks a simple question: prove that this recommendation followed the rules. Most LLM-driven recommenders cannot answer that question. They can explain themselves fluently, but explanation is not enforcement. In regulated or policy-heavy environments—media platforms, marketplaces, cultural quotas, fairness mandates—that gap is no longer tolerable. ...

January 17, 2026 · 4 min · Zelina
Cover image

Survival by Swiss Cheese: Why AI Doom Is a Layered Failure, Not a Single Bet

Opening — Why this matters now Ever since ChatGPT escaped the lab and wandered into daily life, arguments about AI existential risk have followed a predictable script. One side says doom is imminent. The other says it’s speculative hand-wringing. Both sides talk past each other. The paper behind this article does something refreshingly different. Instead of obsessing over how AI might kill us, it asks a sharper question: how exactly do we expect to survive? Not rhetorically — structurally. ...

January 17, 2026 · 5 min · Zelina
Cover image

When Memory Stops Guessing: Stitching Intent Back into Agent Memory

Opening — Why this matters now Everyone is chasing longer context windows. Million-token prompts. Endless chat logs. The assumption is simple: if the model can see everything, it will remember correctly. This paper shows why that assumption fails. In long-horizon, goal-driven interactions, errors rarely come from missing information. They come from retrieving the wrong information—facts that are semantically similar but contextually incompatible. Bigger windows amplify the problem. Noise scales faster than relevance. ...

January 17, 2026 · 3 min · Zelina
Cover image

Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down

Opening — Why this matters now Enterprise teams didn’t adopt RAG to win leaderboard benchmarks. They adopted it to answer boring, expensive questions buried inside spreadsheets, PDFs, and contracts—accurly, repeatably, and with citations they can defend. That’s where things quietly break. Top‑K retrieval looks competent in demos, then collapses in production. The model sees plenty of text, yet still misses conditional clauses, material constraints, or secondary scope definitions. The failure mode isn’t hallucination in the usual sense. It’s something more procedural: the right information exists, but it never makes it into the context window in the first place. ...

January 16, 2026 · 4 min · Zelina
Cover image

Drawing with Ghost Hands: When GenAI Helps Architects — and When It Quietly Undermines Them

Opening — Why this matters now Architectural studios are quietly changing. Not with robotic arms or parametric scripts, but with prompts. Text-to-image models now sit beside sketchbooks, offering instant massing ideas, stylistic variations, and visual shortcuts that once took hours. The promise is obvious: faster ideation, lower friction, fewer blank pages. The risk is less visible. When creativity is partially outsourced, what happens to confidence, authorship, and cognitive effort? ...

January 16, 2026 · 4 min · Zelina
Cover image

One Agent Is a Bottleneck: When Genomics QA Finally Went Multi-Agent

Opening — Why this matters now Genomics QA is no longer a toy problem for language models. It sits at the uncomfortable intersection of messy biological databases, evolving schemas, and questions that cannot be answered from static training data. GeneGPT proved that LLMs could survive here—barely. This paper shows why surviving is not the same as scaling. ...

January 16, 2026 · 3 min · Zelina
Cover image

Reasoning or Guessing? When Recursive Models Hit the Wrong Fixed Point

Opening — Why this matters now Reasoning models are having a moment. Latent-space architectures promise to outgrow chain-of-thought without leaking tokens or ballooning costs. Benchmarks seem to agree. Some of these systems crack puzzles that leave large language models flat at zero. And yet, something feels off. This paper dissects a flagship example—the Hierarchical Reasoning Model (HRM)—and finds that its strongest results rest on a fragile foundation. The model often succeeds not by steadily reasoning, but by stumbling into the right answer and staying there. When it stumbles into the wrong one, it can stay there too. ...

January 16, 2026 · 4 min · Zelina
Cover image

When Agents Talk Back: Why AI Collectives Need a Social Theory

Opening — Why this matters now Multi-agent AI is no longer a lab curiosity. Tool-using LLM agents already negotiate, cooperate, persuade, and sometimes sabotage—often without humans in the loop. What looks like “emergent intelligence” at first glance is, more precisely, a set of interaction effects layered on top of massive pre-trained priors. And that distinction matters. Traditional multi-agent reinforcement learning (MARL) gives us a language for agents that learn from scratch. LLM-based agents do not. They arrive already socialized. ...

January 16, 2026 · 3 min · Zelina
Cover image

When Goals Collide: Synthesizing the Best Possible Outcome

Opening — Why this matters now Most AI control systems are still designed around a brittle assumption: either the agent satisfies everything, or the problem is declared unsolvable. That logic collapses quickly in the real world. Robots run out of battery. Services compete for shared resources. Environments act adversarially, not politely. In practice, goals collide. ...

January 16, 2026 · 4 min · Zelina
Cover image

When Models Know They’re Wrong: Catching Jailbreaks Mid-Sentence

Opening — Why this matters now Most LLM safety failures don’t look dramatic. They look fluent. A model doesn’t suddenly turn malicious. It drifts there — token by token — guided by coherence, momentum, and the quiet incentive to finish the sentence it already started. Jailbreak attacks exploit this inertia. They don’t delete safety alignment; they outrun it. ...

January 16, 2026 · 4 min · Zelina