Cover image

From Ballots to Budgets: Can LLMs Be Trusted as Social Planners?

TL;DR for operators This paper asks a deceptively operational question: can an LLM act as a social planner when it must allocate a fixed budget across competing public projects? Not in the inspirational LinkedIn sense. In the literal sense: choose project IDs, stay within budget, maximise community utility, and return a valid allocation. ...

August 11, 2025 · 20 min · Zelina
Cover image

Love in the Time of Context: Why LLMs Still Don't Get You

TL;DR for operators Personalization does not fail because the model forgot your birthday. That would be almost charming. It fails because the system remembers too much in the wrong shape. The Cupid benchmark tests whether LLMs can infer a user’s context-dependent preference from prior multi-turn interactions and apply it to a new request.1 The setup is deliberately business-relevant: users do not announce a clean preference profile; they reveal expectations through feedback, correction, and mild conversational friction. Very realistic. Nobody fills out a YAML file called my_deeply_contextual_preferences.yml, at least not outside certain Slack channels. ...

August 5, 2025 · 16 min · Zelina