Opening — Why this matters now
As businesses increasingly rely on natural language to query complex datasets — “Show me the average Q3 sales in Europe” — ambiguity has become both a practical headache and a philosophical blind spot. The instinct has been to “fix” vague queries, forcing AI systems to extract a single, supposedly correct intent. But new research from CWI and the University of Amsterdam suggests we’ve been asking the wrong question all along. Ambiguity isn’t the enemy — it’s part of how humans think and collaborate.
In their paper Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis, Daniel Gomm, Cornelius Wolff, and Madelon Hulsebos argue that ambiguity, when properly understood, is not a failure of user precision but a form of cooperative communication. The real task for AI systems is not to eliminate uncertainty, but to manage it.
Background — Context and prior art
Most natural language interfaces to data — from text-to-SQL tools to dashboard assistants — operate under an implicit assumption: that every user question has a single, well-defined answer. This assumption works fine in narrow, schema-bound settings (like querying one known database), but breaks down in open-domain contexts, where users may not even know what data exists.
Traditionally, research has approached ambiguity as an error condition — something to detect, classify, and resolve into a single, latent intent. Benchmarks like Spider and BIRD reward exact execution accuracy, not interpretive nuance. The result is a distorted ecosystem of evaluations: models are trained to guess rather than collaborate.
Analysis — The cooperative query framework
The authors propose a cooperative query framework grounded in Grice’s cooperative principle from linguistics — the idea that conversation participants provide just enough information, trusting the listener to infer the rest. They introduce three classes of queries:
| Query Type | Description | Example |
|---|---|---|
| Unambiguous | Fully specified; no interpretation needed. | “What was the mean temperature in June–August in Copenhagen from 2000–2025?” |
| Cooperative | Partially specified but resolvable via reasonable inference. | “What is the average summer temperature in Copenhagen?” |
| Uncooperative | Underspecified and irresolvable; lacks enough grounding. | “What is the average temperature?” |
This framing redefines “good” queries not by their rigidity but by their resolvability through shared context. Ambiguity can thus be intentional — a user’s way of delegating analytical choices (e.g., correlation method, timeframe) to the system. In other words: underspecification is a feature, not a bug.
Findings — Ambiguity and bias in benchmarks
Applying this framework to 15 popular datasets — from FeTaQA and HiTab to DA-Code and KramaBench — the authors found two key distortions:
- Data-privileged queries are rampant. Many benchmark questions rely on schema-specific hints (like column names or IDs), which no real-world user would know. These shortcuts make AI models seem smarter than they are.
- Unambiguous queries are rare. Only a small share of dataset queries are fully resolvable. Most are ambiguous in either procedural (what operation to perform) or data (what subset to use) dimensions.
The chart below captures this imbalance:
| Dataset | % Data-Privileged Queries | % Unambiguous Queries |
|---|---|---|
| DA-Eval | High | Low |
| DA-Code | High | Low |
| Spider | Moderate | Medium |
| FeTaQA | Moderate | Low |
| HiTab | Low | Medium |
In effect, our evaluation ecosystem rewards the illusion of clarity while punishing realistic human vagueness.
Implications — Designing for cooperation, not certainty
If ambiguity is inevitable — and often meaningful — then system design and evaluation must evolve:
- Evaluate by purpose, not purity. Use unambiguous queries to test execution accuracy, cooperative queries to test interpretive reasoning, and uncooperative ones to test robustness.
- Annotate ambiguity. Datasets should label queries by their grounding level (explicit, inferred, or unresolvable), allowing stratified performance metrics.
- Support iterative dialogue. When queries are too vague, systems should engage users — not fail silently or hallucinate confidence.
- Design for transparency. Systems should disclose their inferred choices (“Assumed mean temperature over 2000–2025”) to let users correct them.
This cooperative framing marks a shift from parsing language to participating in reasoning. It also realigns expectations: AI systems are not mind readers but partners in analytical exploration.
Conclusion — The elegance of shared uncertainty
The idea that machines must always “know” what we mean is a hangover from deterministic computing. In analytical reasoning — as in conversation — meaning is often negotiated, not retrieved. Recognizing that cooperation, not perfection, defines useful AI interaction reframes the future of natural language analytics.
Ambiguity, in this light, becomes a design material. It’s not about making machines clairvoyant, but about teaching them to ask, “What did you mean by that?” — and to do so intelligently.
Cognaptus: Automate the Present, Incubate the Future.