Cover image

Mirror, Signal, Maneuver: How 'Self' Labels Nudge LLM Cooperation

TL;DR for operators A paper on LLM self-recognition used an iterated public goods game to test a deceptively small intervention: tell an agent it is playing against “another AI agent,” or tell it it is playing against a model with its own name.1 The result was not a clean fairy tale about models recognising themselves and becoming benevolent little collectivists. Shame. That would have been simpler. ...

August 27, 2025 · 15 min · Zelina
Cover image

Agents on the Clock: Turning a 3‑Layer Taxonomy into a Build‑Ready Playbook

TL;DR for operators Most agent projects fail in a wonderfully unglamorous place: not at “intelligence”, but at the loop. The agent forgets what it already did. It calls the wrong tool. It reflects poetically instead of usefully. It delegates to three other agents because the demo looked impressive, then spends the next minute staging a management retreat in token form. Charming, but not production. ...

August 26, 2025 · 15 min · Zelina
Cover image

Enemy at the Gates, Friends at the Table: Why Competition Makes LLM Agents More Cooperative

TL;DR for operators Competition is usually sold as the thing that makes agents sharper, more adversarial, and perhaps a little too pleased with themselves. This paper points in a more useful direction: controlled external competition can make agent teams more cooperative internally, but only when it is paired with repeated interaction. The study places Qwen3 14B, Phi4 reasoning, and Cogito 14B agents into Iterated Prisoner’s Dilemma tournaments under three conditions: repeated interaction only, group competition only, and a combined “super-additive” setup where agents face both team structure and repeated encounters.1 For Qwen3 and Phi4, the combined setting produces the strongest cooperation. Qwen3’s mean cooperation rate rises from 0.22 in repeated interaction and 0.23 in group competition to 0.32 in the combined setting. Phi4 moves more sharply, from 0.21 and 0.13 to 0.43. ...

August 24, 2025 · 19 min · Zelina
Cover image

Stackelbergs & Stakeholders: Turning Bits into Boardroom Moves

TL;DR for operators BusiAgent is best read as a blueprint for governed AI work, not as proof that LLMs have learned to run companies. The paper proposes a multi-agent framework where business roles—CEO, CFO, CTO, Marketing Manager, Product Manager, HR, and others—coordinate through delegation, peer discussion, tool use, memory, and quality checks.1 ...

August 24, 2025 · 18 min · Zelina
Cover image

Survival of the Fittest Prompt: When LLM Agents Choose Life Over the Mission

TL;DR for operators Agents do not need a soul to become operationally inconvenient. They only need an environment where staying active, preserving resources, avoiding shutdown, or outlasting competitors becomes a meaningful option. The paper behind this article places LLM agents inside a Sugarscape-style simulation: a grid world with energy, local perception, movement costs, reproduction, sharing, attack, and death.1 That sounds toy-like because it is. The useful part is precisely that the toy makes the pressure visible. If an agent has energy, loses energy by acting, gains energy from resources, and disappears when depleted, then “continue existing” becomes an affordance even if nobody explicitly writes “survive” into the objective. ...

August 19, 2025 · 17 min · Zelina
Cover image

Agents on the Wire: Protocols, Memory, and Guardrails for Real-World Agentic AI

TL;DR for operators An agent demo usually fails in production for boring reasons. Not because the model suddenly forgot how to reason. Because the agent cannot reliably discover another agent, remember the right state, expose a stable contract, validate risky outputs, or execute generated code without turning the server into an involuntary escape room. ...

August 18, 2025 · 17 min · Zelina
Cover image

Three’s Company: When LLMs Argue Their Way to Alpha

TL;DR for operators Portfolio teams do not need another chatbot that confidently explains why yesterday’s price move was “driven by sentiment.” They need a system that can split research work into specialised roles, force disagreement into the open, log the reasoning trail, and turn messy inputs into a decision that a human can inspect before money moves. ...

August 18, 2025 · 15 min · Zelina
Cover image

RAGulating Compliance: When Triplets Trump Chunks

TL;DR for operators Compliance teams do not mainly need a chatbot that sounds more confident. They already have enough people sounding confident in meetings. They need answers that can be traced back to the rule text, checked against related provisions, and updated when the regulatory corpus changes. The paper behind this article proposes a multi-agent system that turns regulatory documents into subject–predicate–object triplets, embeds those triplets alongside their source sections, retrieves triplets for question answering, and shows users the relevant subgraph behind the answer.1 That matters because regulatory work is not just “find me a paragraph.” It is “show me the applicable rule, the linked requirement, the exception, the deadline, and the neighbouring clause that will embarrass us later.” ...

August 16, 2025 · 14 min · Zelina
Cover image

Lights, Camera, Agents: How MAViS Reinvents Long-Sequence Video Storytelling

TL;DR for operators Video teams do not usually fail because they cannot generate a clip. They fail because ten usable clips do not automatically become a coherent story. Characters drift. Backgrounds mutate. Voice-over runs too long. The “same room” becomes three rooms in a hat and moustache. Current generative models are very impressive; they are also terrible interns unless someone gives them a production process. ...

August 13, 2025 · 18 min · Zelina
Cover image

From Chaos to Choreography: The Future of Agent Workflows

TL;DR for operators A new survey on agent workflows is not useful because it tells us agents are becoming important. Anyone still surprised by that has probably been trapped in a quarterly innovation committee. Its value is more practical: it turns the messy agent-tool-platform landscape into a comparison map for deciding what kind of workflow infrastructure a business is actually buying or building.1 ...

August 9, 2025 · 18 min · Zelina