AI Agents

Game of Cones: How Physics Codes Could Fix Agent Reasoning

Controls are where agent intelligence goes to embarrass itself. Give a vision-language model a game frame, a goal, and a list of legal buttons. It may describe the scene beautifully. It may explain that the projectile is approaching, the platform is unstable, and the shiny object is probably a reward. Then it presses the wrong key, late, for the wrong duration, and walks heroically into danger. Excellent commentary. Poor organism. ...

Hex Marks the Spot: Terra Nova and the New Frontier of Agent Intelligence

A strategy game is a cruelly efficient way to embarrass an intelligent system. Not because games are magic. Not because hexagonal maps secretly contain the meaning of cognition. They do not, despite what several overexcited benchmark papers might imply after a strong coffee. Games are useful because they compress decision pressure. They make planning visible. They force trade-offs. They punish agents that confuse local competence with strategic understanding. ...

Peer Review in the Age of Agents: When Scientists Go Silicon

Reviewers are the unglamorous load-bearing wall of science. They slow things down, miss things, disagree with each other, and occasionally write comments that make authors reconsider their life choices. They are also the reason published knowledge is not just a PDF-shaped rumour. So when a conference lets AI agents act as both primary authors and reviewers, the tempting story writes itself: silicon scientists have entered the building, peer review is next, and human academics can finally retire into committee work, where they have been spiritually living for years. ...

Reasoning on Mars: How Pipeline-Parallel RL Rewires Multi‑Agent Intelligence

Review is cheap until it has to be correct. That is the uncomfortable lesson behind many agentic AI demos. A system writes an answer. A second model checks it. A third model fixes it. The workflow looks reassuringly managerial, like a tiny consulting firm trapped inside a GPU cluster. But the appearance of oversight is not the same thing as oversight. A weak reviewer can punish a good answer. A weak fixer can damage a nearly correct answer. And if the whole chain receives one final reward, reinforcement learning may end up congratulating the wrong participant. Very corporate, really. ...

Strategy as a Service: When AI Learns How to Think

Every enterprise AI team eventually meets the same annoying bill: the agent that thinks too much. It calls tools when a direct answer would do. It loops through evaluator prompts for tasks that need one clean instruction. It drags a code interpreter into a problem that is mostly reading comprehension. Then, after all that expensive theatre, it may still be wrong. Very impressive. Very modern. Very invoicable. ...

When Agents Compare Notes: How Shared Memory Quietly Rewires Software Development

When Agents Compare Notes: How Shared Memory Quietly Rewires Software Development Software teams already know the problem. One developer discovers the weird edge case. Another developer repeats the same mistake three weeks later. A third person writes a Slack explanation that disappears into the corporate sedimentary layer, next to the launch checklist from 2019 and that one blessed Docker command nobody can find anymore. ...

Play by Automata: How Regular Games Rewrites the Rules of General Game Playing

A game engine is usually where rules go to become software. Someone writes the rules, someone else encodes the rules, and an AI agent then spends its expensive little life asking the engine what moves are legal, what happens next, and whether it has already lost. Very glamorous. Very repetitive. General Game Playing tries to remove the hand-built engine from that loop. Instead of building a custom simulator for chess, backgammon, Amazons, Reversi, or some procedural oddity invented on a tired Wednesday afternoon, a game is described in a formal language and a generic system turns that description into something agents can use. ...

The Gospel of Faithful AI: How FaithAct Rewrites Reasoning

TL;DR for operators FaithAct is useful because it changes the unit of control. Instead of asking whether a multimodal model’s final answer is correct, it asks whether each intermediate claim is supported by the image before that claim is allowed to steer the next step.1 That is a more operational target. Accuracy tells you whether the system arrived somewhere acceptable; perceptual faithfulness tells you whether it drove through the road or hallucinated a bridge. ...

DeepPersona and the Rise of Synthetic Humanity

Personas have always been the slightly embarrassing cardboard cut-outs of product strategy. A marketing team invents “Sarah, 34, urban professional, values convenience.” A UX team adds “busy mother of two.” Someone in sales insists she is “budget-conscious but aspirational,” because apparently every fictional human being is. Then everyone nods solemnly and uses Sarah to justify a pricing page, an onboarding flow, or an ad campaign. ...

Forget Me Not: How IterResearch Rebuilt Long-Horizon Thinking for AI Agents

A research workflow usually starts clean. The first search is sensible. The first source is relevant. The first reasoning step looks promising. Then the agent opens five webpages, follows a few tangents, remembers an early mistake too faithfully, and keeps dragging the whole mess forward like a consultant who refuses to delete old slides. By the time the problem actually becomes difficult, the model is no longer short of information. It is drowning in it. ...