Opening — Why this matters now

The rise of AI in civic life has been faster than most democracies can legislate. Governments and NGOs are experimenting with large language models (LLMs) to summarize public opinions, generate consensus statements, and even draft expert questions in citizen assemblies. The promise? Efficiency and inclusiveness. The risk? Representation by proxy—where the algorithm decides whose questions matter.

The new paper Question the Questions: Auditing Representation in Online Deliberative Processes (De et al., 2025) offers a rigorous framework for examining that risk. It turns the abstract ideals of fairness and inclusivity into something measurable, using the mathematics of justified representation (JR) from social choice theory. In doing so, it shows how to audit whether AI-generated “summary questions” in online deliberations truly reflect the people’s diverse concerns—or just the most statistically coherent subset.

Background — The problem of digital deliberation

In traditional deliberative democracy, citizens discuss issues in small groups and submit questions for experts. But in online or large-scale assemblies, hundreds of participants may submit hundreds of questions, and time allows only a few to be answered. The bottleneck: selecting a small slate of representative questions.

Moderators often choose manually, introducing human bias. LLMs promise a fix—by clustering similar questions or synthesizing new ones. Yet their summaries can flatten nuance. The key question becomes: are the final questions representative of all participants, or only the algorithm’s sense of linguistic similarity?

Analysis — Auditing representation with JR

The authors propose an auditing framework based on a quantitative variant of justified representation (JR). In plain English: if (k) questions are selected from (m) proposals, any sufficiently large group of participants (roughly (n/k)) with shared preferences should see at least one of their ideas reflected in the final slate.

They formalize this using utility functions derived from LLM embeddings. Each participant’s question is embedded into a high-dimensional vector space; the cosine similarity between this and other questions indicates “utility.” The algorithm then checks whether any large enough group of participants would all prefer some unchosen question over those that were selected. If such a group exists, representation has failed.

The framework includes two key algorithms:

Algorithm Complexity Purpose
Naïve JR audit O((mn^2)) Exhaustive verification of justified representation
Single-pass JR audit O((mn\log n)) Efficient verification using sorted utility thresholds

These algorithms allow moderators or platforms to audit the representativeness of question slates—whether chosen by humans, optimization, or LLMs—without human interpretation bias.

Findings — When AI summarizes democracy

The researchers tested their framework on real data from Stanford’s Deliberative Democracy Lab, covering twelve sessions across three global events: America in One Room (U.S. political reform) and Meta’s Community Forum on AI chatbots and agents. They compared:

  1. Human-moderated slates – actual questions asked to experts.
  2. Extractive algorithmic slates – optimal subsets chosen via integer programming to minimize JR violation.
  3. Abstractive LLM slates – newly written questions synthesized by GPT‑4‑class models.

Results:

  • Algorithmic methods—both extractive and LLM‑generated—produced slates more representative than human moderators in most cases.
  • Abstractive LLM slates matched or exceeded extractive ones in smaller groups, suggesting LLMs can synthesize shared concerns effectively.
  • However, performance varied: in larger or more polarized groups, human‑curated or optimized extractive methods still provided stronger guarantees.
Method Average JR Value (↓ better) Consistency
Human moderators 0.84–1.68 Variable
Integer‑program extractive 0.26–0.52 Stable
LLM‑best abstractive 0.28–0.56 Coherent but uneven
Random baseline ≈1.2 Poor

(Derived from Table 1, De et al. 2025)

The takeaway: AI improves representativeness—when it’s audited. But without such checks, even elegant language models risk amplifying subtle majorities or linguistic biases.

Implications — Algorithmic democracy and accountability

The framework marks a quiet turning point: deliberative democracy now has an algorithmic audit trail. By embedding this method directly into a live online deliberation platform (used in 50+ countries), organizers can quantify whether all participants are “heard” by the final set of questions.

For policymakers and civic tech builders, this signals a new governance paradigm: LLMs can help scale participation, but only when coupled with auditable fairness metrics. Future directions might extend from JR to broader proportionality metrics like BJR or EJR, or even integrate participant feedback loops to refine AI’s sense of utility.

Conclusion — Questioning the questions

AI doesn’t just answer questions anymore—it decides which ones get asked. By making representational fairness measurable, Question the Questions gives democracy a way to interrogate its own algorithms. The challenge for practitioners is not just to automate deliberation, but to ensure that automation remains accountable.

Cognaptus: Automate the Present, Incubate the Future.