Cover image

Pulling the Thread: Why LLM Reasoning Often Unravels

Opening — Why this matters now Large Language Model (LLM) agents have crossed an uncomfortable threshold. They are no longer just autocomplete engines or polite chat companions; they are being entrusted with financial decisions, scientific hypothesis generation, and multi-step autonomous actions. With that elevation comes a familiar demand: explain yourself. Chain-of-Thought (CoT) reasoning was supposed to be the answer. Let the model “think out loud,” and transparency follows—or so the story goes. The paper behind Project Ariadne argues, with unsettling rigor, that this story is largely fiction. Much of what we see as reasoning is closer to stagecraft: convincing, articulate, and causally irrelevant. ...

January 6, 2026 · 4 min · Zelina
Cover image

Many Minds, One Decision: Why Agentic AI Needs a Brain, Not Just Nerves

Opening — Why this matters now Agentic AI has officially crossed the line from clever demo to operational liability. We are no longer talking about chatbots that occasionally hallucinate trivia. We are deploying autonomous systems that decide, act, and trigger downstream consequences—often across tools, APIs, and real-world processes. In that setting, the old comfort blanket of “the model said so” is no longer defensible. ...

December 29, 2025 · 3 min · Zelina
Cover image

When More Explanation Hurts: The Early‑Stopping Paradox of Agentic XAI

Opening — Why this matters now We keep telling ourselves a comforting story: if an AI explanation isn’t good enough, just refine it. Add another round. Add another chart. Add another paragraph. Surely clarity is a monotonic function of effort. This paper politely demolishes that belief. As agentic AI systems—LLMs that reason, generate code, analyze results, and then revise themselves—move from demos into decision‑support tools, explanation quality becomes a first‑order risk. Not model accuracy. Not latency. Explanation quality. Especially when the audience is human, busy, and allergic to verbose nonsense. ...

December 25, 2025 · 4 min · Zelina
Cover image

XAI, But Make It Scalable: Why Experts Should Stop Writing Rules

Opening — Why this matters now Explainable AI has reached an awkward phase of maturity. Everyone agrees that black boxes are unacceptable in high‑stakes settings—credit, churn, compliance, healthcare—but the tools designed to open those boxes often collapse under their own weight. Post‑hoc explainers scale beautifully and then promptly contradict themselves. Intrinsic approaches behave consistently, right up until you ask who is going to annotate explanations for millions of samples. ...

December 23, 2025 · 4 min · Zelina
Cover image

Doctor GPT, But Make It Explainable

Opening — Why this matters now Healthcare systems globally suffer from a familiar triad: diagnostic bottlenecks, rising costs, and a shortage of specialists. What makes this crisis especially stubborn is not just capacity—but interaction. Diagnosis is fundamentally conversational, iterative, and uncertain. Yet most AI diagnostic tools still behave like silent oracles: accurate perhaps, but opaque, rigid, and poorly aligned with how humans actually describe illness. ...

December 22, 2025 · 4 min · Zelina
Cover image

When Tokens Remember: Graphing the Ghosts in LLM Reasoning

Opening — Why this matters now Large language models don’t think—but they do accumulate influence. And that accumulation is exactly where most explainability methods quietly give up. As LLMs move from single-shot text generators into multi-step reasoners, agents, and decision-making systems, we increasingly care why an answer emerged—not just what token attended to what prompt word. Yet most attribution tools still behave as if each generation step lives in isolation. That assumption is no longer just naïve; it is actively misleading. ...

December 18, 2025 · 4 min · Zelina
Cover image

Cities That Think: Reasoning AI for the Urban Century

Opening — Why this matters now By 2050, nearly seven out of ten people will live in cities. Yet most urban planning tools today still operate as statistical mirrors—learning from yesterday’s data to predict tomorrow’s congestion. Predictive models can forecast traffic or emissions, but they don’t reason about why or whether those outcomes should occur. The next leap, as argued by Sijie Yang and colleagues in Reasoning Is All You Need for Urban Planning AI, is not more prediction—but more thinking. ...

November 10, 2025 · 4 min · Zelina
Cover image

Graphing the Invisible: How Community Detection Makes AI Explanations Human-Scale

Opening — Why this matters now Explainable AI (XAI) is growing up. After years of producing colorful heatmaps and confusing bar charts, the field is finally realizing that knowing which features matter isn’t the same as knowing how they work together. The recent paper Community Detection on Model Explanation Graphs for Explainable AI argues that the next frontier of interpretability lies not in ranking variables but in mapping their alliances. Because when models misbehave, the problem isn’t a single feature — it’s a clique. ...

November 5, 2025 · 4 min · Zelina
Cover image

Titles, Not Tokens: Making Job Matching Explainable with STR + KGs

The big idea Job titles are messy: “Managing Director” and “CEO” share zero tokens yet often mean the same thing, while “Director of Sales” and “VP Marketing” are different but related. Traditional semantic similarity (STS) rewards look‑alikes; real hiring needs relatedness (STR)—associations that capture hierarchy, function, and context. A recent study proposes a hybrid pipeline that pairs fine‑tuned Sentence‑BERT embeddings with a skill‑level Knowledge Graph (KG), then evaluates models by region of relatedness (low/medium/high) instead of only global averages. The punchline: this KG‑augmented approach is both more accurate where it matters (high‑STR) and explainable—it can show which skills link two titles. ...

September 17, 2025 · 4 min · Zelina
Cover image

Speaking Fed with Confidence: How LLMs Decode Monetary Policy Without Guesswork

The Market-Moving Puzzle of Fedspeak When the U.S. Federal Reserve speaks, markets move. But the Fed’s public language—often called Fedspeak—is deliberately nuanced, shaping expectations without making explicit commitments. Misinterpreting it can cost billions, whether in trading desks’ misaligned bets or policymakers’ mistimed responses. Even top-performing LLMs like GPT-4 can classify central bank stances (hawkish, dovish, neutral), but without explaining their reasoning or flagging when they might be wrong. In high-stakes finance, that’s a liability. ...

August 12, 2025 · 3 min · Zelina