CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens
Opening — Why this matters now Multi-hop reasoning has quietly become one of the most expensive habits in modern AI systems. Every additional hop—every “and then what?”—typically triggers another retrieval, another prompt expansion, another LLM call. Accuracy improves, yes, but so does the bill. CompactRAG enters this conversation with a refreshingly unfashionable claim: most of this cost is structural, not inevitable. If you stop forcing LLMs to repeatedly reread the same knowledge, multi-hop reasoning does not have to scale linearly in tokens—or in money. ...