CompactRAG: When Multi-Hop Reasoning Stops Burning Tokens
Ask a normal enterprise RAG system a simple factual question, and it behaves politely enough. Retrieve a few passages. Hand them to the model. Generate an answer. Fine. Ask it a question that requires two or three steps, and the machine starts developing expensive habits. It retrieves, reasons, retrieves again, expands the prompt, reasons again, rewrites a query, retrieves more evidence, and then asks the LLM to stitch the mess together. The architecture looks intellectually serious. The invoice looks even more serious. ...