In the age of natural language interfaces to databases (NLIDBs), asking the right question has never been easier—or more perilous. While systems like ChatGPT or SQL-Palm can convert everyday English into valid SQL, they often do so without interrogating the quality of the question itself. And as Peter Drucker warned, “The most dangerous thing is asking the wrong question.”

Enter VeriMinder, a system built not to improve SQL syntax or execution accuracy—but to diagnose and refine the analytical intent behind the user’s query. It tackles a deceptively simple yet far-reaching problem: a well-formed SQL query that answers a poorly formed question can yield confident but misleading insights. This is particularly problematic in enterprise settings where non-technical users rely on LLM-based BI assistants.

When NL2SQL Works Technically but Fails Analytically

Imagine a financial analyst querying for “clients with the largest loans” to assess default risk. An NL2SQL system will cheerfully produce the SQL, but VeriMinder spots three biases:

  • Similarity bias — assuming loan size correlates with default risk
  • Framing bias — framing the issue around loan size rather than default likelihood
  • Selection bias — excluding small loans, which may actually default more often

The SQL is valid. The logic is flawed. VeriMinder’s innovation is that it intervenes before the SQL is even executed.

Three Weapons Against Flawed Thinking

VeriMinder integrates three core mechanisms to isolate and fix these vulnerabilities:

  1. Contextual Bias Detection — A taxonomy of 53 cognitive biases and data schema checks (e.g., temporal, categorical misalignments) maps NL questions to likely analytical errors.
  2. Hard-to-Vary Analysis — Inspired by David Deutsch’s epistemology, it prioritizes questions that yield robust, falsifiable explanations—ones that can’t be arbitrarily tweaked without losing validity.
  3. LLM-Powered Self-Refinement — Through a three-step pipeline (multi-template generation, critic-based scoring, and self-reflection), it iteratively proposes better queries for the same analytical intent.

This last piece is critical. Where traditional NLP tools offer 1:1 conversions from question to SQL, VeriMinder offers a 1-to-n-to-1 path: you pose a question, it explores several analytical variants, critiques them, and synthesizes a superior one.

A New Metric for Analytical Quality: The HV Score

VeriMinder introduces a principled objective: the Hard-to-Vary (HV) Score. It’s defined as:

HV(S) = I(T; S) / DL(S)

Where:

  • I(T; S) is mutual information between decision target T and selected variables S
  • DL(S) is the description length of S

In essence, good analytical questions maximize insight per unit of complexity. It echoes the Information Bottleneck principle, but through the lens of falsifiable decision support.

While direct HV optimization is computationally hard, VeriMinder uses LLM-based heuristics that mimic this principle: critics estimate informativeness, prompt structure enforces compactness, and the feedback loop approximates a complexity-constrained search.

Real-World Impact: Better Questions, Better Decisions

In human evaluations with 63 users, 82.5% reported improved analytical quality using VeriMinder. Against four baseline methods, including question perturbation and critic-agent feedback, VeriMinder consistently ranked #1 in accuracy, concreteness, and comprehensiveness—with gains up to 86.9% in comprehensiveness over plain NL2SQL.

A side-by-side comparison interface further empowers users to contrast original and refined questions and results, making the refinement process both educational and confidence-boosting.

Implications: BI Assistants Need a Logic Layer

Most LLM-integrated BI tools assume the user knows what they want. VeriMinder flips that assumption. It recognizes that in exploratory analytics, users may not even know what they don’t know—and offers guardrails accordingly.

This positions VeriMinder not just as a tool, but as a design philosophy: business intelligence agents should not only translate, but challenge.

In future iterations, the authors plan to extend VeriMinder’s framework to Python/pandas code generation, integrate bias-aware confidence scores, and embed conformal prediction thresholds to reject misleading analyses.

From Interface to Intelligence

VeriMinder doesn’t replace LLM-based NL2SQL systems—it makes them smarter, safer, and more epistemically sound. In a world flooded with dashboards and “insight-as-a-service,” the real differentiator won’t be the speed of getting answers, but the quality of the questions we dare to ask.


Cognaptus: Automate the Present, Incubate the Future.