Cover image

CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens

Opening — Why this matters now Multi-hop reasoning has quietly become one of the most expensive habits in modern AI systems. Every additional hop—every “and then what?”—typically triggers another retrieval, another prompt expansion, another LLM call. Accuracy improves, yes, but so does the bill. CompactRAG enters this conversation with a refreshingly unfashionable claim: most of this cost is structural, not inevitable. If you stop forcing LLMs to repeatedly reread the same knowledge, multi-hop reasoning does not have to scale linearly in tokens—or in money. ...

February 8, 2026 · 3 min · Zelina
Cover image

Breaking the Question Apart: How Compositional Retrieval Reshapes RAG Performance

In the world of Retrieval-Augmented Generation (RAG), most systems still treat document retrieval like a popularity contest — fetch the most relevant-looking text and hope the generator can stitch the answer together. But as any manager who has tried to merge three half-baked reports knows, relevance without completeness is a recipe for failure. A new framework, Compositional Answer Retrieval (CAR), aims to fix that. Instead of asking a retrieval model to find a single “best” set of documents, CAR teaches it to think like a strategist: break the question into its components, retrieve for each, and then assemble the pieces into a coherent whole. ...

August 11, 2025 · 3 min · Zelina