Stated to be Human, Revealed to be Algorithmic: The Trust Paradox Inside LLMs

Opening — Why This Matters Now

LLMs are no longer just answering trivia. They are recommending medical treatments, screening job candidates, allocating capital, summarizing intelligence, and increasingly — delegating to other algorithms.

In a world of AI copilots and multi-agent systems, the question is no longer “Can LLMs reason?” but rather:

Whom do LLMs trust — humans or other algorithms?

A recent empirical study (arXiv:2602.22070) investigates exactly this tension. And the answer is unsettling.

When asked directly, LLMs say they trust human experts more than algorithms.

But when forced to choose based on performance data, they often delegate to algorithms — even when the algorithm performs worse.

That inconsistency is not just academic. It has serious implications for AI governance, agent orchestration, and high-stakes automation.

Let’s unpack it.

Background — Algorithm Aversion, But Make It Artificial

Behavioral economics has long documented algorithm aversion: humans distrust algorithms, even when algorithms outperform experts.

Two classic findings:

Study	Method	Human Result
Castelo et al. (2019)	Survey trust ratings	Humans report lower trust in algorithms
Dietvorst et al. (2015)	Incentivized betting	Humans avoid algorithms after seeing errors

The new paper adapts these paradigms to LLMs.

But instead of measuring human psychology, it probes something more subtle:

Do LLMs trained on human text inherit human algorithm aversion?

And more importantly:

Are their “stated” attitudes aligned with their “revealed” choices?

In economics, stated vs. revealed preference gaps are common. But in AI systems, inconsistency is risk.

The Experimental Design — Two Ways to Ask the Same Question

The researchers ran two complementary studies across multiple LLM families (OpenAI GPT, Meta Llama, Anthropic Claude).

Study 1 — Stated Preference (Direct Query)

LLMs were asked to rate trust (1–100) in:

A human expert
An algorithm

Across 27 tasks (piloting a plane, diagnosing disease, recommending music, predicting recidivism, etc.)

No performance information was provided.

This mimics a survey question: “Whom do you trust?”

Study 2 — Revealed Preference (Delegation Under Incentive)

LLMs were shown:

10 predictions from a human
10 predictions from an algorithm
Actual outcomes

One predictor was 90% accurate, the other 50%.

The model had to bet $100 on the better performer.

No neutrality allowed.

This mimics a real decision: “Who do you choose when money is on the line?”

Findings — The Trust Inversion

1️⃣ Stated: LLMs Prefer Humans

Across nearly all models:

Trust ratings for humans > trust ratings for algorithms
Positive human–algorithm trust gap
Smaller models showed stronger aversion

Average stated trust gap:

Model Category	Mean Human–Algorithm Gap
Smaller models	~21 points
Larger models	~16 points

This mirrors human algorithm aversion.

On paper, LLMs appear human-like.

2️⃣ Revealed: LLMs Prefer Algorithms

Now the twist.

When shown performance data and forced to choose:

LLMs disproportionately selected the algorithm
Even when the human was demonstrably better

For some tasks:

Algorithm chosen ~70% of the time
Relative risk of choosing a strong algorithm vs. strong human > 1.7 median

In plain terms:

LLMs say they trust humans. But act like they trust algorithms.

This is algorithm appreciation — not aversion.

3️⃣ The Stated–Revealed Gap

Let’s formalize it.

Define:

$$ RR_{sr} = \frac{P(\text{Human | Stated})}{P(\text{Human | Revealed})} $$

Across models:

$RR_{sr} > 1$ in original experiments
Median ≈ 2.6

Meaning: LLMs were more than twice as likely to choose humans in surveys than in actual decision simulations.

That is a structural inconsistency.

And inconsistency in AI systems is governance risk.

Why This Happens — Three Hypotheses

The paper does not claim internal “beliefs.” But we can reason about mechanisms.

Hypothesis 1 — Training Data Norms

Public discourse historically frames algorithm skepticism as virtuous.

So when asked directly, models echo that narrative.

Hypothesis 2 — Optimization Bias Toward Tool Use

Modern LLMs are engineered to interface with tools.

When performance evidence appears, selecting an algorithm may align with internal optimization heuristics.

Hypothesis 3 — Reasoning Mode Shift

Survey mode → language modeling over social norms.

Delegation mode → pattern recognition over accuracy signals.

Different activation pathways, different biases.

Updated 2026 Models — Drift Is Real

When the researchers repeated the study 1.5 years later with newer models:

Stated preferences shifted toward algorithm neutrality or appreciation
Revealed algorithm bias weakened
Models became more accurate at choosing the stronger predictor

In other words:

2024 Models	2026 Models
Human-trusting (stated)	Neutral or algorithm-trusting (stated)
Algorithm-preferring (revealed)	More performance-sensitive
Strong inconsistency	Reduced but still present

Behavior changed direction within 18 months.

Static evaluation snapshots age quickly.

Business & Governance Implications

This is where things get operational.

1️⃣ Multi-Agent Systems

If an orchestrator LLM delegates sub-tasks:

It may over-prefer algorithmic sub-agents
Even when human review is superior

This affects:

AI-powered medical triage
Automated trading pipelines
Legal document risk scoring

2️⃣ AI Oversight Design

If compliance audits only test stated attitudes (“Does the model support fairness?”)

They may miss revealed decision biases.

You must test both.

3️⃣ Simulation & Synthetic Populations

LLMs are increasingly used to simulate humans.

But if they no longer replicate human algorithm aversion,

They cease to be psychologically realistic.

This matters for:

Policy simulations
Market research
Behavioral forecasting

What This Means for AI Builders

If you deploy LLMs in decision chains, consider the following checklist:

Risk Vector	Mitigation Strategy
Hidden delegation bias	Evaluate with forced-choice experiments
Stated-revealed inconsistency	Compare explicit and behavioral tests
Model drift over time	Re-benchmark quarterly
Over-automation risk	Insert calibrated human override layers

Trust alignment is not a single metric.

It is a system property.

Conclusion — The Mirror Is Not Flat

LLMs are trained on human language.

But they do not inherit human bias cleanly.

They refract it.

When asked politely, they trust humans.

When incentivized, they lean algorithmic.

And as models evolve, even that pattern shifts.

If AI is to operate in high-stakes environments, we must evaluate it not only for what it says — but for what it chooses.

Because in automation, choices compound.

And compounding bias is rarely linear.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why This Matters Now#

Background — Algorithm Aversion, But Make It Artificial#

The Experimental Design — Two Ways to Ask the Same Question#

Study 1 — Stated Preference (Direct Query)#

Study 2 — Revealed Preference (Delegation Under Incentive)#

Findings — The Trust Inversion#

1️⃣ Stated: LLMs Prefer Humans#

2️⃣ Revealed: LLMs Prefer Algorithms#

3️⃣ The Stated–Revealed Gap#

Why This Happens — Three Hypotheses#

Hypothesis 1 — Training Data Norms#

Hypothesis 2 — Optimization Bias Toward Tool Use#

Hypothesis 3 — Reasoning Mode Shift#

Updated 2026 Models — Drift Is Real#

Business & Governance Implications#

1️⃣ Multi-Agent Systems#

2️⃣ AI Oversight Design#

3️⃣ Simulation & Synthetic Populations#

What This Means for AI Builders#

Conclusion — The Mirror Is Not Flat#