Prompt Robustness

TL;DR for operators LLM survey panels are cheap, fast, and extremely willing to give you numbers. That is exactly why they are dangerous. A recent paper by Jens Rupprecht, Georg Ahnert, and Markus Strohmaier stress-tests nine instruction-tuned LLMs on World Values Survey-style questions and finds that small prompt changes can materially alter synthetic survey responses.1 The study runs 167,400 simulated interviews across 62 normative survey questions, 25 repeated runs per model-question-condition, and a battery of perturbations covering answer-order reversal, refusal-option removal, odd/even scale changes, priming text, typos, synonyms, paraphrases, and a combined paraphrase-plus-reversal condition. ...

Prompt Robustness

Prompt and Circumstance: Why One Accuracy Number Is Not a Reliability Audit

Echo Chamber in a Prompt: How Survey Bias Creeps into LLMs