Cover image

Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down

Opening — Why this matters now Enterprise teams didn’t adopt RAG to win leaderboard benchmarks. They adopted it to answer boring, expensive questions buried inside spreadsheets, PDFs, and contracts—accurly, repeatably, and with citations they can defend. That’s where things quietly break. Top‑K retrieval looks competent in demos, then collapses in production. The model sees plenty of text, yet still misses conditional clauses, material constraints, or secondary scope definitions. The failure mode isn’t hallucination in the usual sense. It’s something more procedural: the right information exists, but it never makes it into the context window in the first place. ...

January 16, 2026 · 4 min · Zelina
Cover image

Drawing with Ghost Hands: When GenAI Helps Architects — and When It Quietly Undermines Them

Opening — Why this matters now Architectural studios are quietly changing. Not with robotic arms or parametric scripts, but with prompts. Text-to-image models now sit beside sketchbooks, offering instant massing ideas, stylistic variations, and visual shortcuts that once took hours. The promise is obvious: faster ideation, lower friction, fewer blank pages. The risk is less visible. When creativity is partially outsourced, what happens to confidence, authorship, and cognitive effort? ...

January 16, 2026 · 4 min · Zelina
Cover image

One Agent Is a Bottleneck: When Genomics QA Finally Went Multi-Agent

Opening — Why this matters now Genomics QA is no longer a toy problem for language models. It sits at the uncomfortable intersection of messy biological databases, evolving schemas, and questions that cannot be answered from static training data. GeneGPT proved that LLMs could survive here—barely. This paper shows why surviving is not the same as scaling. ...

January 16, 2026 · 3 min · Zelina
Cover image

Reasoning or Guessing? When Recursive Models Hit the Wrong Fixed Point

Opening — Why this matters now Reasoning models are having a moment. Latent-space architectures promise to outgrow chain-of-thought without leaking tokens or ballooning costs. Benchmarks seem to agree. Some of these systems crack puzzles that leave large language models flat at zero. And yet, something feels off. This paper dissects a flagship example—the Hierarchical Reasoning Model (HRM)—and finds that its strongest results rest on a fragile foundation. The model often succeeds not by steadily reasoning, but by stumbling into the right answer and staying there. When it stumbles into the wrong one, it can stay there too. ...

January 16, 2026 · 4 min · Zelina
Cover image

When Agents Talk Back: Why AI Collectives Need a Social Theory

Opening — Why this matters now Multi-agent AI is no longer a lab curiosity. Tool-using LLM agents already negotiate, cooperate, persuade, and sometimes sabotage—often without humans in the loop. What looks like “emergent intelligence” at first glance is, more precisely, a set of interaction effects layered on top of massive pre-trained priors. And that distinction matters. Traditional multi-agent reinforcement learning (MARL) gives us a language for agents that learn from scratch. LLM-based agents do not. They arrive already socialized. ...

January 16, 2026 · 3 min · Zelina
Cover image

When Goals Collide: Synthesizing the Best Possible Outcome

Opening — Why this matters now Most AI control systems are still designed around a brittle assumption: either the agent satisfies everything, or the problem is declared unsolvable. That logic collapses quickly in the real world. Robots run out of battery. Services compete for shared resources. Environments act adversarially, not politely. In practice, goals collide. ...

January 16, 2026 · 4 min · Zelina
Cover image

When Models Know They’re Wrong: Catching Jailbreaks Mid-Sentence

Opening — Why this matters now Most LLM safety failures don’t look dramatic. They look fluent. A model doesn’t suddenly turn malicious. It drifts there — token by token — guided by coherence, momentum, and the quiet incentive to finish the sentence it already started. Jailbreak attacks exploit this inertia. They don’t delete safety alignment; they outrun it. ...

January 16, 2026 · 4 min · Zelina
Cover image

EvoFSM: Teaching AI Agents to Evolve Without Losing Their Minds

Opening — Why this matters now Agentic AI has entered its teenage years: curious, capable, and dangerously overconfident. As LLM-based agents move from toy demos into deep research—multi-hop reasoning, evidence aggregation, long-horizon decision-making—the industry has discovered an uncomfortable truth. Fixed workflows are too rigid, but letting agents rewrite themselves freely is how you get hallucinations with a superiority complex. ...

January 15, 2026 · 3 min · Zelina
Cover image

Knowing Is Not Doing: When LLM Agents Pass the Task but Fail the World

Opening — Why this matters now LLM agents are getting disturbingly good at finishing tasks. They click the right buttons, traverse web pages, solve text-based games, and close tickets. Benchmarks applaud. Dashboards glow green. Yet something feels off. Change the environment slightly, rotate the layout, tweak the constraints — and suddenly the same agent behaves like it woke up in a stranger’s apartment. The problem isn’t execution. It’s comprehension. ...

January 15, 2026 · 4 min · Zelina
Cover image

Lean LLMs, Heavy Lifting: When Workflows Beat Bigger Models

Opening — Why this matters now Everyone wants LLMs to think harder. Enterprises, however, mostly need them to think correctly — especially when optimization models decide real money, real capacity, and real risk. As organizations scale, optimization problems grow beyond toy examples. Data spills into separate tables, constraints multiply, and naïve prompt‑to‑solver pipelines quietly collapse. ...

January 15, 2026 · 3 min · Zelina