Cover image

When Videos Grow Hands: How PhysWorld Teaches Robots to Stop Hallucinating Physics

Robots are not impressed by nice videos. A generated clip can show a hand placing a book into a shelf, pouring tomatoes from a pan, or sweeping scraps into a dustpan. It can look coherent enough to fool a casual viewer and perhaps even a product demo audience, which is not exactly the highest bar in technology. But a robot does not execute “looks coherent.” It executes poses, contacts, forces, trajectories, collisions, and failures. ...

November 16, 2025 · 16 min · Zelina
Cover image

Graph Minds, Game Moves: How Multi‑Agent Learning Is Quietly Redrawing AI Strategy

A traffic light is not just a traffic light once the other lights start learning. That is the uncomfortable starting point for strategic AI systems. A single model can optimise a route, price, recommendation, allocation, or control policy. But the moment other decision-makers are learning at the same time, the environment stops behaving like scenery. It becomes a cast. Each actor updates, reacts, misreads, cooperates, defects, imitates, or quietly ruins the assumptions in your simulator. Very rude, but entirely realistic. ...

November 14, 2025 · 16 min · Zelina
Cover image

Play by Automata: How Regular Games Rewrites the Rules of General Game Playing

A game engine is usually where rules go to become software. Someone writes the rules, someone else encodes the rules, and an AI agent then spends its expensive little life asking the engine what moves are legal, what happens next, and whether it has already lost. Very glamorous. Very repetitive. General Game Playing tries to remove the hand-built engine from that loop. Instead of building a custom simulator for chess, backgammon, Amazons, Reversi, or some procedural oddity invented on a tired Wednesday afternoon, a game is described in a formal language and a generic system turns that description into something agents can use. ...

November 14, 2025 · 15 min · Zelina
Cover image

Don’t Self-Sabotage Me Now: Rational Policy Gradients for Sane Multi-Agent Learning

Kitchen work is not hard because chopping onions is metaphysically difficult. It is hard because two people must agree, implicitly and quickly, who gets the onion, who holds the plate, who waits by the pot, and who moves out of the corridor before everyone performs a small culinary traffic accident. That is why Overcooked remains such a useful multi-agent benchmark. It turns coordination into something visible. Agents do not merely need to “perform a task”; they need to infer what another agent is about to do and avoid becoming a sentient obstacle. ...

November 13, 2025 · 14 min · Zelina
Cover image

Proof, Policy, and Probability: How DeepProofLog Rewrites the Rules of Reasoning

Proofs are supposed to be the respectable part of AI: tidy, inspectable, and resistant to the usual neural-network fog machine. Then reality turns up, as it so often does, carrying a bill. In neurosymbolic AI, the bill is search. A system may know the rules. It may even combine them with neural perception. But if answering a query requires enumerating a vast space of possible proofs, the promise of “interpretable reasoning” quickly becomes a very elegant way to run out of time. ...

November 12, 2025 · 18 min · Zelina
Cover image

Forget Me Not: How IterResearch Rebuilt Long-Horizon Thinking for AI Agents

A research workflow usually starts clean. The first search is sensible. The first source is relevant. The first reasoning step looks promising. Then the agent opens five webpages, follows a few tangents, remembers an early mistake too faithfully, and keeps dragging the whole mess forward like a consultant who refuses to delete old slides. By the time the problem actually becomes difficult, the model is no longer short of information. It is drowning in it. ...

November 11, 2025 · 17 min · Zelina
Cover image

When Agents Think in Waves: Diffusion Models for Ad Hoc Teamwork

A warehouse robot does not fail only when it drops the box. Sometimes it fails earlier, in the quieter moment when another robot takes an unexpected route and the first robot keeps behaving as though the original choreography still exists. Nobody crashes. Nothing explodes. The system merely becomes stupid in a very expensive way. ...

November 11, 2025 · 18 min · Zelina
Cover image

Agents on the Clock: How TPS-Bench Exposes the Time Management Problem in AI

A competent assistant can make a list. A useful assistant knows what must happen first. That distinction sounds small until an AI agent is asked to do something ordinary and annoyingly realistic: check a calendar, search the web, compare options, use a map, assemble a recommendation, and perhaps create a document at the end. None of those steps is exotic. The difficulty is that some of them can run in parallel, some must wait for earlier results, and some become nonsense if executed too early. This is less “genius at work” than “junior operations manager with access to too many browser tabs.” Naturally, it is where things get interesting. ...

November 6, 2025 · 13 min · Zelina
Cover image

When the Sandbox Thinks Back: Training AI Agents in Simulated Realities

Workflow software has a deeply unglamorous problem: reality keeps changing. A customer support agent may know the refund policy, but then the customer changes their address, the order record has a missing field, the tool returns a cryptic error, and the next API call requires a schema nobody mentioned in the demo. A spreadsheet agent may know how to summarise a table, but the file path is wrong, the calendar has a conflicting event, and the “obvious” action fails because the world, in its charmingly vindictive way, is not a benchmark prompt. ...

November 6, 2025 · 18 min · Zelina
Cover image

When Markets Dream: The Rise of Agentic AI Traders

Liquidity is boring until it vanishes. Most investors notice market makers only when the screen suddenly looks thin: fewer bids, wider spreads, worse execution, and the faint smell of panic priced into every click. A market maker’s job is not glamorous. It quotes buy and sell prices, earns the spread, manages inventory, and tries not to become the proud owner of too much of the wrong asset at the wrong moment. Finance, as usual, rewards the person who stands calmly in the middle of everyone else’s urgency. ...

November 5, 2025 · 15 min · Zelina