Cover image

Mirror, Signal, Maneuver: How 'Self' Labels Nudge LLM Cooperation

When an agent thinks it sees itself in the mirror, it doesn’t necessarily smile—it sometimes clutches its wallet. TL;DR In an iterated public‑goods game (20 rounds, 10 tokens per round, 1.6 multiplier), telling models they’re playing “another AI” versus “themselves” shifts contributions by up to ~4 points in some settings. Direction of the shift depends on the prompt persona: with collective prompts, “self” labels often reduced contributions; with selfish prompts, “self” labels sometimes increased matching/cooperation. Effects persist under rephrased prompts and when reasoning traces aren’t requested, and they appear even in four‑agent self‑play variants. For enterprise multi‑agent AI, identity cues are levers. Manage them like you manage feature flags: test, monitor, and standardize. What the authors tested (and why it’s clever) Game mechanics. Two (and later four) LLM agents repeatedly choose how much to contribute (0–10) to a common pool each round. Pool is multiplied by 1.6 and split evenly; keeping more is privately optimal, but coordinated contribution yields higher joint payoffs. ...

August 27, 2025 · 5 min · Zelina
Cover image

Adding Up to Nothing: Coarse Reasoning and the Vanishing St. Petersburg Paradox

The St. Petersburg paradox has long been a thorn in the side of rational decision theory. Offering an infinite expected payout but consistently eliciting modest real-world bids, the game exposes a rift between mathematical expectation and human judgment. Most solutions dodge this by modifying utility functions, imposing discounting, or resorting to exotic number systems. But what if we change the addition itself? ...

July 19, 2025 · 3 min · Zelina