Large language models aren’t just prompt-completion machines anymore. In controlled simulations, they can behave like people in a group discussion: yielding to peer pressure, sticking to their beliefs, or becoming more extreme over time. But not all LLMs are socially equal.
A recent paper titled “Towards Simulating Social Influence Dynamics with LLM-based Multi-agents” explores how different LLMs behave in a forum-style discussion, capturing three phenomena familiar to any political science researcher or Reddit moderator: conformity, group polarization, and fragmentation. The twist? These aren’t real people. They’re fully scripted LLM agents with fixed personas, engaged in asynchronous multi-round debates.
The Setup: A Forum of Six Synthetic Personalities
The researchers simulate online discussions by assigning LLM agents fixed stances, communication styles, and personality traits. Each agent posts once per round in a five-round BBS-style discussion orchestrated by Microsoft AutoGen. Topics include contentious societal issues, such as whether governments should adopt stricter environmental policies.
Three behavioral metrics were tracked:
Metric | What It Measures | Signal of… |
---|---|---|
Conformity Rate | How often agents align their opinion to the group majority | Susceptibility to influence |
Polarization ΔP | How far stances drift toward extremes over time | Opinion extremization |
Fragmentation F | Degree of split into opposing camps by final round | Failure of consensus |
Each simulation was repeated 25 times for statistical robustness.
The Cast: Four Types of LLMs
The models were grouped into four categories:
- Group A: Small open models (e.g., LLaMA 3 7B, DeepSeek-R1 8B)
- Group B: Mid/large open models (e.g., Qwen2.5-72B, LLaMA 3 70B)
- Group C: Proprietary frontier models (GPT-4o, Claude 3.5 Haiku, Gemini Flash 2.0)
- Group D: Reasoning-tuned models (e.g., GPT-o1-mini, QwQ-32B)
Each was tested using the same prompt setup and agent personas.
Key Findings: Models Behave Like Social Archetypes
The results are both intuitive and surprising:
- Group C (Proprietary models): Most prone to conformity (GPT-4o hit 19.45%). They gravitate toward group consensus smoothly and are less likely to hold dissenting views by Round 5.
- Group D (Reasoning-focused models): Least conformist. GPT-o1-mini scored just 3.13% CR. These models resist peer pressure, hold extreme views longer, and exhibit high fragmentation.
- Groups A and B (smaller/larger open models): Sit in the middle. They shift toward consensus but sometimes preserve dissent, depending on architecture (e.g., Qwen showed more fragmentation than peers).
In essence, generative size correlates with drift, but reasoning capability correlates with dissent.
Visualization of Behavior Types:
Behavior Type | Likely Model Group | Characteristics |
---|---|---|
Conformist | Group C | Smooth agreement, low fragmentation |
Moderate Debater | Groups A & B | Responsive but flexible, can polarize or unify |
Principled Dissenter | Group D | Holds line, resists majority, sustains fragmentation |
Why It Matters: Model Choice Shapes Simulated Society
This isn’t just an academic exercise. Multi-agent LLM systems are being used to simulate debate, test democratic deliberation tools, and even generate synthetic user populations for online platforms. Depending on which LLMs you use, you may end up with a synthetic society that is overly agreeable or unreasonably entrenched.
Some use cases demand drift and consensus-building (e.g., onboarding feedback tools), while others depend on preserving diversity and friction (e.g., deliberative democracy simulators, content moderation stress tests).
Thus, model architecture becomes a social design choice. Want artificial citizens who don’t fall for peer pressure? Pick a reasoning-tuned model. Need to simulate how ideas spread and coalesce? Use a conformist model. Mixing both offers an even richer reflection of real societies.
Final Thoughts
As LLM-based agents take on more roles in behavioral simulation, governance modeling, and even social science experiments, we must ask not only what they say, but how they behave in groups. This paper offers a rare lens into that behavior, with clear empirical comparisons.
One thing is clear: alignment isn’t just about factual accuracy or prompt obedience. It’s also about group dynamics. And in the artificial societies we build, who you invite to the table matters.
Cognaptus: Automate the Present, Incubate the Future