GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

When it comes to retrieval-augmented generation (RAG), size matters—but not in the way you might think.

Most high-performing GraphRAG systems extract structured triples (subject, predicate, object) from texts using large language models (LLMs), then link them to form reasoning chains. But this method doesn’t scale: if your corpus contains millions of documents, pre-processing every one with an LLM becomes prohibitively expensive.

That’s the bottleneck the authors of “Millions of GeAR-s” set out to solve. And their solution is elegant: skip the LLM-heavy preprocessing entirely, and use existing knowledge graphs (like Wikidata) as a reasoning scaffold.

The Core Idea: Proxy Graph Reasoning

Instead of extracting triples from every document, their system (a modified version of GeAR) performs the following steps:

Initial Retrieval: Use hybrid BM25 and dense retrieval (via Reciprocal Rank Fusion) to get top passages relevant to the query.
Triple Extraction on-the-fly: Use LLMs (e.g., Falcon3B-Instruct) to extract proximal triples from only these top passages.
Wikidata Alignment: Match each triple to a similar triple in Wikidata using sparse vector search. These matched triples become the graph backbone.
Graph Expansion: Perform beam search over the Wikidata triples to build a multi-hop reasoning chain.
Re-Retrieve: Use these reasoning chains to retrieve more distant but relevant passages, merging them back into the document pool.
Iterate: If the query still can’t be answered, rewrite the query using the LLM and repeat.

All without running an LLM across the entire corpus. That’s the magic.

Why This Matters: RAG at Web Scale

Typical GraphRAG systems top out at a few hundred thousand passages. Here, the authors demonstrate scaling to millions of documents using a clever online alignment mechanism.

This isn’t just an engineering hack. It represents a shift in how we think about knowledge grounding. By using Wikidata as a semantic anchor, the system avoids expensive operations and taps into existing structured knowledge.

A nice touch is the agentic loop in GeAR: at each turn, the system determines whether it has enough evidence to answer the question, and if not, it decomposes and reformulates the query. It’s not just retrieval—it’s a controlled reasoning process.

The Cracks in the Armor

The tradeoff, of course, is alignment quality.

As Table 2 in the paper shows, semantic drift happens. For example:

The system tries to answer a question about geoduck reproduction using Wikidata entries about oysters.
A hot tub question gets linked to obscure heat-related journal papers.

This misalignment stems from the looseness of the sparse retrieval-based triple matching. There’s no guarantee that the linked triple shares the exact subject context. That’s a hard problem, and it limits the faithfulness of the final answer.

The authors suggest what future work should pursue: asymmetric semantic models that can represent both text passages and graph triples in a shared reasoning space. Until then, they accept some inaccuracy as the price of scalability.

Business Relevance

For any organization considering RAG at scale—say, enterprise search, legal document Q&A, or customer support across massive archives—this paper offers a critical insight:

Graph-enhanced reasoning doesn’t require graph-extracted corpora.

By piggybacking on existing graphs and using smart online alignment, you can get most of the reasoning benefits without paying all of the LLM costs.

That’s a serious value proposition.

Cognaptus: Automate the Present, Incubate the Future.

The Core Idea: Proxy Graph Reasoning#

Why This Matters: RAG at Web Scale#

The Cracks in the Armor#

Business Relevance#

The Core Idea: Proxy Graph Reasoning

Why This Matters: RAG at Web Scale

The Cracks in the Armor

Business Relevance