Cover image

AGI by Committee: Why the First General Intelligence Won’t Arrive Alone

Opening — Why this matters now For years, AGI safety discussions have revolved around a single, looming figure: the model. One system. One alignment problem. One decisive moment. That mental model is tidy — and increasingly wrong. The paper “Distributional AGI Safety” argues that AGI is far more likely to emerge not as a monolith, but as a collective outcome: a dense web of specialized, sub‑AGI agents coordinating, trading capabilities, and assembling intelligence the way markets assemble value. AGI, in this framing, is not a product launch. It is a phase transition. ...

December 19, 2025 · 4 min · Zelina
Cover image

CitySeeker: Lost in Translation, Found in the City

Opening — Why this matters now Urban navigation looks deceptively solved. We have GPS, street-view imagery, and multimodal models that can describe a scene better than most humans. And yet, when vision-language models (VLMs) are asked to actually navigate a city — not just caption it — performance collapses in subtle, embarrassing ways. The gap is no longer about perception quality. It is about cognition: remembering where you have been, knowing when you are wrong, and understanding implicit human intent. This is the exact gap CitySeeker is designed to expose. ...

December 19, 2025 · 3 min · Zelina
Cover image

Painkillers with Foresight: Teaching Machines to Anticipate Cancer Pain

Opening — Why this matters now Cancer pain is rarely a surprise to clinicians. Yet it still manages to arrive uninvited, often at night, often under-treated, and almost always after the window for calm, preventive adjustment has closed. In lung cancer wards, up to 90% of patients experience moderate to severe pain episodes — and most of these episodes are predictable in hindsight. ...

December 19, 2025 · 4 min · Zelina
Cover image

TOGGLE or Die Trying: Giving LLM Compression a Spine

Opening — Why this matters now LLM compression is having an identity crisis. On one side, we have brute-force pragmatists: quantize harder, prune deeper, pray nothing important breaks. On the other, we have theoreticians insisting that something essential is lost — coherence, memory, truthfulness — but offering little beyond hand-waving and validation benchmarks. As LLMs creep toward edge deployment — embedded systems, on-device assistants, energy‑capped inference — this tension becomes existential. You can’t just say “it seems fine.” You need guarantees. Or at least something better than vibes. ...

December 19, 2025 · 4 min · Zelina
Cover image

When Black Boxes Grow Teeth: Mapping What AI Can *Actually* Do

Opening — Why this matters now We are deploying black-box AI systems faster than we are understanding them. Large language models, vision–language agents, and robotic controllers are increasingly asked to do things, not just answer questions. And yet, when these systems fail, the failure is rarely spectacular—it is subtle, conditional, probabilistic, and deeply context-dependent. ...

December 19, 2025 · 3 min · Zelina
Cover image

Delegating to the Almost-Aligned: When Misaligned AI Is Still the Rational Choice

Opening — Why this matters now The AI alignment debate has a familiar rhythm: align the values first, deploy later. Sensible, reassuring—and increasingly detached from reality. In practice, we are already delegating consequential decisions to systems we do not fully understand, let alone perfectly align. Trading algorithms rebalance portfolios, recommendation engines steer attention, and autonomous agents negotiate, schedule, and filter on our behalf. The real question is no longer “Is the AI aligned?” but “Is it aligned enough to justify delegation, given what it can do better than us?” ...

December 18, 2025 · 4 min · Zelina
Cover image

From Benchmarks to Beakers: Stress‑Testing LLMs as Scientific Co‑Scientists

Opening — Why this matters now Large Language Models have already aced exams, written code, and argued philosophy with unsettling confidence. The obvious next step was inevitable: can they do science? Not assist, not summarize—but reason, explore, and discover. The paper behind this article asks that question without romance. It evaluates LLMs not as chatbots, but as proto‑scientists, and then measures how far the illusion actually holds. ...

December 18, 2025 · 3 min · Zelina
Cover image

Long Thoughts, Short Bills: Distilling Mathematical Reasoning at Scale

Opening — Why this matters now Large language models can solve math problems. The more interesting question in 2025 is whether they can learn how to reason, at scale, across contexts that are long, messy, and computationally expensive. Most math datasets answer the first question. Nemotron-Math answers the second — and does so with a surprisingly pragmatic eye on cost. ...

December 18, 2025 · 4 min · Zelina
Cover image

Mind-Reading Without Telepathy: Predictive Concept Decoders

Opening — Why this matters now For years, AI interpretability has promised transparency while quietly delivering annotations, probes, and post-hoc stories that feel explanatory but often fail the only test that matters: can they predict what the model will actually do next? As large language models become agents—capable of long-horizon planning, policy evasion, and strategic compliance—interpretability that merely describes activations after the fact is no longer enough. What we need instead is interpretability that anticipates behavior. That is the ambition behind Predictive Concept Decoders (PCDs). ...

December 18, 2025 · 5 min · Zelina
Cover image

When Tokens Remember: Graphing the Ghosts in LLM Reasoning

Opening — Why this matters now Large language models don’t think—but they do accumulate influence. And that accumulation is exactly where most explainability methods quietly give up. As LLMs move from single-shot text generators into multi-step reasoners, agents, and decision-making systems, we increasingly care why an answer emerged—not just what token attended to what prompt word. Yet most attribution tools still behave as if each generation step lives in isolation. That assumption is no longer just naïve; it is actively misleading. ...

December 18, 2025 · 4 min · Zelina