Opening — Why this matters now

RAG is supposed to make large language models safer. Ground the model in documents, add citations, and hallucinations politely leave the room—or so the story goes. In practice, especially in expert domains, RAG often fails in a quieter, more dangerous way: it retrieves something relevant, but not the right kind of evidence.

This problem becomes acute when knowledge is split across heterogeneous sources with unequal authority. In Chinese Tibetan Medicine, for example, short encyclopedic summaries, classical canons, and modern clinical papers all coexist. Treat them as one flat corpus, and the system reliably gravitates toward the densest, easiest-to-match text—even when that text is epistemically weak.

The paper behind this article tackles that failure mode head-on, not by scaling models or adding more data, but by redesigning how RAG systems choose, combine, and justify their evidence fileciteturn0file0.

Background — Context and prior art

Most RAG pipelines implicitly assume that retrieval score correlates with evidentiary quality. This assumption already strains in general domains; in specialized medicine, it collapses entirely. Dense summaries dominate vector similarity. Long-context models mishandle passage order (“lost in the middle”). Naive concatenation turns cross-source verification into a prompt-engineering gamble.

Prior work has addressed pieces of the puzzle: citation-aware generation, RAG evaluation frameworks, and knowledge-graph–augmented retrieval. What has been missing is a systematic treatment of partitioned knowledge bases, where provenance, authority, and cross-library validation are first-class requirements rather than afterthoughts.

Analysis — What the paper actually does

The authors frame Tibetan-medicine QA as a traceable cross-source RAG problem with three explicitly separated knowledge bases:

  • E: Encyclopedia entries (dense, accessible, weak authority)
  • T: Classical texts (conceptual authority)
  • P: Clinical papers (empirical authority)

Two mechanisms anchor the solution.

1. DAKS: routing before retrieval

Instead of retrieving blindly from all sources, the system first performs probe retrieval within each knowledge base. From the score distributions, it computes KB-level features—peak relevance, score concentration, margins, and coverage—then allocates a budgeted retrieval quota per KB.

This matters because it breaks a common failure mode: encyclopedias winning simply because they are short and information-dense. DAKS treats retrieval as a resource-allocation problem, not a popularity contest.

The result is controlled diversity: every KB gets a minimum budget, but authoritative sources receive more attention when the query demands it.

2. Alignment graph–guided fusion

Retrieval alone does not guarantee coherent evidence. Once candidates are collected, the paper introduces a chunk–entity alignment graph linking text chunks to typed medical entities (diseases, drugs, formulas, symptoms).

This graph serves three roles:

  • Identifying cross-KB “bridges” (e.g., a classical concept linked to modern clinical evidence)
  • Scoring candidates by graph proximity and entity overlap
  • Enforcing coverage-aware evidence packing under a strict token budget

Instead of dumping passages into the prompt, evidence is packed to ensure required sources are represented—explicitly supporting cross-source verification.

Findings — What actually improves (with numbers)

The paper evaluates on a purpose-built 500-question benchmark, including single-source and cross-source queries. The key metric is not fluency, but CrossEv@5: whether the top-5 cited evidence covers all required knowledge bases.

Method CrossEv@5 Faithfulness Citation Correctness
Naive multi-KB concat 0.62 0.75 0.70
DAKS only 0.72 0.82 0.75
Graph fusion only 0.65 0.63 0.65
DAKS + Graph Fusion 0.78 0.81 0.76

Two observations stand out:

  1. Routing without fusion helps, but only partially.
  2. Graph fusion without routing can actually hurt answer relevance.

The lesson is architectural, not incremental: evidence organization depends on upstream source selection.

Implications — Why this extends beyond Tibetan medicine

Although the domain is narrow, the implications are broad.

  • Enterprise RAG systems often blend policies, internal docs, manuals, and external regulations. Density bias is endemic.
  • AI governance and compliance depend on traceable, auditable answers—not just correct ones.
  • Agentic systems that reason across tools and databases need explicit source-awareness, or they will optimize for convenience over correctness.

Perhaps most strikingly, the system uses a relatively lightweight 7B generator. The gains come from retrieval economics and evidence geometry, not brute-force modeling.

Conclusion — RAG as an epistemic system

This work reframes RAG from a retrieval-plus-generation trick into an epistemic system: one that must reason about where knowledge comes from, how it should be weighted, and how claims are justified.

As organizations push RAG into high-stakes domains, the question will not be “did the model answer?” but “why this evidence, and not another?” This paper offers one of the clearest blueprints yet.

Cognaptus: Automate the Present, Incubate the Future.