Cover image

CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens

Opening — Why this matters now Multi-hop reasoning has quietly become one of the most expensive habits in modern AI systems. Every additional hop—every “and then what?”—typically triggers another retrieval, another prompt expansion, another LLM call. Accuracy improves, yes, but so does the bill. CompactRAG enters this conversation with a refreshingly unfashionable claim: most of this cost is structural, not inevitable. If you stop forcing LLMs to repeatedly reread the same knowledge, multi-hop reasoning does not have to scale linearly in tokens—or in money. ...

February 8, 2026 · 3 min · Zelina
Cover image

SD‑RAG: Don’t Trust the Model, Trust the Pipeline

Opening — Why this matters now RAG was supposed to make LLMs safer. Instead, it quietly became a liability. As enterprises rushed to bolt retrieval layers onto large language models, they unintentionally created a new attack surface: sensitive internal data flowing straight into a model that cannot reliably distinguish instructions from content. Prompt injection is not a corner case anymore—it is the default threat model. And telling the model to “behave” has proven to be more of a suggestion than a guarantee. ...

January 20, 2026 · 4 min · Zelina
Cover image

Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down

Opening — Why this matters now Enterprise teams didn’t adopt RAG to win leaderboard benchmarks. They adopted it to answer boring, expensive questions buried inside spreadsheets, PDFs, and contracts—accurly, repeatably, and with citations they can defend. That’s where things quietly break. Top‑K retrieval looks competent in demos, then collapses in production. The model sees plenty of text, yet still misses conditional clauses, material constraints, or secondary scope definitions. The failure mode isn’t hallucination in the usual sense. It’s something more procedural: the right information exists, but it never makes it into the context window in the first place. ...

January 16, 2026 · 4 min · Zelina
Cover image

Replace, Don’t Expand: When RAG Learns to Throw Things Away

Opening — Why this matters now RAG systems are having an identity crisis. On paper, retrieval-augmented generation is supposed to ground large language models in facts. In practice, when queries require multi-hop reasoning, most systems panic and start hoarding context like it’s a survival skill. Add more passages. Expand the window. Hope the model figures it out. ...

December 12, 2025 · 4 min · Zelina
Cover image

Privacy by Proximity: How Nearest Neighbors Made In-Context Learning Differentially Private

Opening — Why this matters now As large language models (LLMs) weave themselves into every enterprise workflow, a quieter issue looms: the privacy of the data used to prompt them. In‑context learning (ICL) — the art of teaching a model through examples in its prompt — is fast, flexible, and dangerously leaky. Each query could expose confidential examples from private datasets. Enter differential privacy (DP), the mathematical armor for sensitive data — except until now, DP methods for ICL have been clumsy and utility‑poor. ...

November 8, 2025 · 4 min · Zelina
Cover image

Agents with Interest: How Fintech Taught RAG to Read the Fine Print

Opening — Why this matters now The fintech industry is an alphabet soup of acronyms and compliance clauses. For a large language model (LLM), it’s a minefield of misunderstood abbreviations, half-specified processes, and siloed documentation that lives in SharePoint purgatory. Yet financial institutions are under pressure to make sense of their internal knowledge—securely, locally, and accurately. Retrieval-Augmented Generation (RAG), the method of grounding LLM outputs in retrieved context, has emerged as the go-to approach. But as Mastercard’s recent research shows, standard RAG pipelines choke on the reality of enterprise fintech: fragmented data, undefined acronyms, and role-based access control. The paper Retrieval-Augmented Generation for Fintech: Agentic Design and Evaluation proposes a modular, multi-agent redesign that turns RAG from a passive retriever into an active, reasoning system. ...

November 4, 2025 · 4 min · Zelina
Cover image

Confounder Hunters: How LLM Agents are Rewriting the Rules of Causal Inference

When Hidden Variables Become Hidden Costs In causal inference, confounders are the uninvited guests at your data party — variables that influence both treatment and outcome, quietly skewing results. In healthcare, failing to adjust for them can turn life-saving insights into misleading noise. Traditionally, finding these culprits has been the realm of domain experts, a slow and costly process that doesn’t scale well. The paper from National Sun Yat-Sen University proposes a radical alternative: put Large Language Model (LLM)-based agents into the causal inference loop. These agents don’t just crunch numbers — they reason, retrieve domain knowledge, and iteratively refine estimates, effectively acting as tireless, always-available junior experts. ...

August 12, 2025 · 3 min · Zelina
Cover image

GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

When it comes to retrieval-augmented generation (RAG), size matters—but not in the way you might think. Most high-performing GraphRAG systems extract structured triples (subject, predicate, object) from texts using large language models (LLMs), then link them to form reasoning chains. But this method doesn’t scale: if your corpus contains millions of documents, pre-processing every one with an LLM becomes prohibitively expensive. That’s the bottleneck the authors of “Millions of GeAR-s” set out to solve. And their solution is elegant: skip the LLM-heavy preprocessing entirely, and use existing knowledge graphs (like Wikidata) as a reasoning scaffold. ...

July 24, 2025 · 3 min · Zelina