Entropy Over Relevance: Why Your RAG System Is Asking the Wrong Questions

Opening — Why this matters now

Most enterprise RAG systems are quietly overconfident.

They retrieve what looks relevant, stack it into a context window, and let the model produce an answer with unnerving certainty. The problem isn’t the model. It’s the question we’re asking the system to optimize: relevance.

In messy, real-world environments—legal disputes, financial analysis, conflicting reports—relevance is not the bottleneck. Uncertainty is.

The paper Entropic Claim Resolution proposes a simple but unsettling shift: stop asking “what is most similar to the query?” and start asking “what reduces uncertainty the most?” fileciteturn0file0

That distinction sounds academic. It isn’t.

Background — The limits of relevance-driven RAG

Classic RAG follows a predictable loop:

Embed query
Retrieve top-k similar chunks
Generate answer

This works well—until it doesn’t.

Where it breaks

Scenario	What relevance retrieval does	What actually happens
Ambiguous query	Retrieves similar but redundant evidence	Model averages ambiguity into false certainty
Conflicting sources	Retrieves both sides without structure	Model blends contradictions into hallucinations
Multi-hop reasoning	Retrieves local matches	Misses discriminative linking evidence

The paper names this failure mode epistemic collapse: the system keeps retrieving more of the same, instead of what resolves the uncertainty. fileciteturn0file0

Agentic approaches (ReAct, Tree-of-Thoughts) try to fix this with iteration—but they lack a mathematical objective for which evidence to retrieve next or when to stop.

So they wander. Sometimes intelligently. Often expensively.

Analysis — What the paper actually does

The core idea is deceptively clean:

Treat answering a question as reducing uncertainty over possible answers.

Step 1: Replace “answers” with hypotheses

Instead of generating one answer, the system maintains a set of competing hypotheses:

Hypothesis	Description
A₁	Explanation 1
A₂	Explanation 2
A₃	Explanation 3

Initially, all are equally likely.

Step 2: Measure uncertainty explicitly

Uncertainty is quantified using entropy:

$$ H(A) = -\sum P(a) \log P(a) $$

Higher entropy → more uncertainty.

Lower entropy → clearer answer.

Step 3: Select evidence by information gain, not similarity

Instead of asking:

“Which document is most relevant?”

ECR asks:

“Which claim will most reduce uncertainty between these hypotheses?”

This is formalized as Expected Entropy Reduction (EER).

Claim	Effect on hypotheses	Value
c₁	Supports all hypotheses equally	Low (useless)
c₂	Strongly supports A₁ over others	High (discriminative)
c₃	Confirms contradiction between A₁ and A₂	Very high

The system prefers discriminative evidence, not redundant evidence.

Step 4: Introduce a principled stopping rule

Most RAG systems stop because of:

token limits
iteration limits
vague “confidence” thresholds

ECR stops when:

$$ H(A) \leq \epsilon $$

In plain English:

Stop when uncertainty is low enough to justify an answer.

That’s a rare thing in LLM pipelines: a clear definition of “enough evidence.”

Step 5: Handle contradictions explicitly

Here’s where it gets interesting.

If the system detects irreconcilable contradictions, it refuses to collapse uncertainty.

Instead of forcing an answer, it outputs:

competing hypotheses
their probabilities
explicit conflicts

This is not a bug. It’s the point.

Findings — What actually improves

1. Faster uncertainty reduction

Method	Claims Used	Final Entropy	ΔEntropy per Claim
Retrieval-only	15	1.585	0.000
Random	5	1.24	0.068
ECR	5	0.21	0.274

ECR reaches clarity with 1/3 the evidence.

2. Comparable accuracy, better discipline

On multi-hop QA benchmarks:

Method	EM	F1	Faithfulness
Baseline RAG	0.313	0.459	0.639
Random	0.207	0.307	0.427
ECR	0.297	0.450	0.626

Slightly lower than baseline—but without overconfidence.

That trade-off is intentional.

3. Behavior under contradiction (the real test)

System	Overconfident Errors	Ambiguity Handling
Baseline RAG	~99%	None
ECR	~0%	Explicitly exposed

This is the quiet breakthrough.

Most systems hide uncertainty. ECR surfaces it.

Implications — What this means for real systems

1. Retrieval becomes a decision problem

RAG is no longer:

“Find similar text.”

It becomes:

“Select the next best experiment to reduce uncertainty.”

That’s Bayesian thinking—applied at inference time.

2. Smaller models can outperform larger ones (sometimes)

The paper hints at something uncomfortable:

Better selection logic can beat more context or parameters.

In enterprise settings, that translates to:

lower token cost
faster latency
more predictable behavior

3. Agentic AI gets a missing backbone

Most “agents” today are:

prompt-driven
heuristic-based
loosely controlled

ECR offers something they lack:

a utility function (entropy reduction)
a policy (maximize EER)
a termination rule (H ≤ ε)

In other words: structure.

4. Compliance and risk management become tractable

In high-stakes domains (finance, legal, medical), the worst failure mode is not wrong answers.

It’s confidently wrong answers.

ECR changes the system behavior from:

“Always answer”

to:

“Answer only when justified—or expose ambiguity.”

That is much easier to audit.

Conclusion — The quiet shift from scale to control

For the past two years, the industry has chased scale:

larger models
longer context
more data

This paper suggests a different path.

Not bigger models.

Better questions.

More precisely:

Better decision rules about what to look at next.

Relevance was always a convenient proxy. It was never the objective.

ECR replaces that proxy with something more honest: uncertainty.

And once you frame RAG that way, a lot of current systems start to look… slightly naive.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The limits of relevance-driven RAG#

Where it breaks#

Analysis — What the paper actually does#

Step 1: Replace “answers” with hypotheses#

Step 2: Measure uncertainty explicitly#

Step 3: Select evidence by information gain, not similarity#

Step 4: Introduce a principled stopping rule#

Step 5: Handle contradictions explicitly#

Findings — What actually improves#

1. Faster uncertainty reduction#

2. Comparable accuracy, better discipline#

3. Behavior under contradiction (the real test)#

Implications — What this means for real systems#

1. Retrieval becomes a decision problem#

2. Smaller models can outperform larger ones (sometimes)#

3. Agentic AI gets a missing backbone#

4. Compliance and risk management become tractable#

Conclusion — The quiet shift from scale to control#