Cover image

When LLMs Meet Time: Why Time-Series Reasoning Is Still Hard

Opening — Why this matters now Large Language Models are increasingly marketed as general problem solvers. They summarize earnings calls, reason about code, and explain economic trends with alarming confidence. But when confronted with time—real, numeric, structured temporal data—that confidence starts to wobble. The TSAQA benchmark arrives at exactly the right moment, not to celebrate LLM progress, but to measure how far they still have to go. ...

February 3, 2026 · 3 min · Zelina
Cover image

When One Patch Rules Them All: Teaching MLLMs to See What Isn’t There

Opening — Why this matters now Multimodal large language models (MLLMs) are no longer research curiosities. They caption images, reason over diagrams, guide robots, and increasingly sit inside commercial products that users implicitly trust. That trust rests on a fragile assumption: that these models see the world in a reasonably stable way. The paper behind this article quietly dismantles that assumption. It shows that a single, reusable visual perturbation—not tailored to any specific image—can reliably coerce closed-source systems like GPT‑4o or Gemini‑2.0 into producing attacker‑chosen outputs. Not once. Not occasionally. But consistently, across arbitrary, previously unseen images. ...

February 3, 2026 · 5 min · Zelina
Cover image

Agentic Systems Need Architecture, Not Vibes

Opening — Why this matters now Agentic AI has officially entered its awkward adolescence. It can plan, call tools, collaborate, and occasionally impress investors—but it also hallucinates, forgets, loops endlessly, and collapses under modest real‑world complexity. The problem is no longer model capability. It’s architecture. Today’s agent systems are mostly stitched together through intuition, blog wisdom, and prompt folklore. Powerful, yes—but brittle. What’s missing is not another clever prompt trick, but an engineering discipline. ...

February 2, 2026 · 3 min · Zelina
Cover image

Algorithmic Context Is the New Heuristic

Opening — Why this matters now For decades, heuristic design has been a quiet tax on optimization. Every serious deployment of A* or tree search comes with a familiar cost: domain experts handcraft rules, tune parameters, and babysit edge cases. The process is expensive, slow, and brittle. Large Language Models promised automation—but until recently, mostly delivered clever greedy tricks for toy problems. ...

February 2, 2026 · 3 min · Zelina
Cover image

Ask Once, Query Right: Why Enterprise AI Still Gets Databases Wrong

Opening — Why this matters now Enterprises love to say they are “data‑driven.” In practice, they are database‑fragmented. A single natural‑language question — How many customers in California? — may be answerable by five internal databases, all structurally different, semantically overlapping, and owned by different teams. Routing that question to the right database is no longer a UX problem. It is an architectural one. ...

February 2, 2026 · 4 min · Zelina
Cover image

GAVEL: When AI Safety Grows a Rulebook

Opening — Why this matters now AI safety is drifting toward an uncomfortable paradox. The more capable large language models become, the less transparent their internal decision-making appears — and the more brittle our existing safeguards feel. Text-based moderation catches what models say, not what they are doing. Activation-based safety promised to fix this, but in practice it has inherited many of the same flaws: coarse labels, opaque triggers, and painful retraining cycles. ...

February 2, 2026 · 4 min · Zelina
Cover image

Glue, Not Chains: Teaching AI to Degrade Amyloid-β the Hard Way

Opening — Why this matters now For more than two decades, Alzheimer’s drug discovery has been trapped in a loop: identify amyloid, try to block it, fail clinically, repeat with better marketing. What has quietly changed is not our understanding of amyloid-β itself, but our tooling. Intracellular amyloid-β42 (Aβ42) is now widely seen as an early, toxic driver of disease—yet it remains structurally awkward, aggregation-prone, and resistant to classical inhibition strategies. ...

February 2, 2026 · 4 min · Zelina
Cover image

Grading the Doctor: How Health-SCORE Scales Judgment in Medical AI

Opening — Why this matters now Healthcare LLMs have a credibility problem. Not because they cannot answer medical questions—many now ace exam-style benchmarks—but because real medicine is not a multiple-choice test. It is open-ended, contextual, uncertain, and unforgiving. In that setting, how a model reasons, hedges, and escalates matters as much as what it says. ...

February 2, 2026 · 4 min · Zelina
Cover image

Routing the Brain: Why Smarter LLM Orchestration Beats Bigger Models

Opening — Why this matters now As large language models quietly slide from novelty to infrastructure, a less glamorous question has become existential: who pays the inference bill? Agentic systems amplify the problem. A single task is no longer a prompt—it is a chain of reasoning steps, retries, tool calls, and evaluations. Multiply that by production scale, and cost becomes the bottleneck long before intelligence does. ...

February 2, 2026 · 3 min · Zelina
Cover image

Seeing Is Thinking: When Images Do the Reasoning

Opening — Why this matters now Large language models have learned to talk their way through reasoning. But the real world does not speak in tokens. It moves, collides, folds, and occludes. As multimodal models mature, a quiet question has become unavoidable: is language really the best internal medium for thinking about physical reality? ...

February 2, 2026 · 3 min · Zelina