Cover image

Recursive Minds: How ReCAP Turns LLMs into Self-Correcting Planners

A stuck workflow rarely looks intelligent. It looks like a support agent asking for the same invoice twice, a coding agent editing the wrong file for the third time, or an operations bot patiently repeating an invalid action because, apparently, persistence is cheaper than understanding. This is the unglamorous failure mode of many LLM agents. They do not collapse because they cannot produce a plan. They collapse because the plan becomes stale, buried, or locally contradicted by new observations. The agent remembers the latest step and forgets the job. ...

November 2, 2025 · 12 min · Zelina
Cover image

The Missing Metric: Measuring Agentic Potential Before It’s Too Late

The Missing Metric: Measuring Agentic Potential Before It’s Too Late Procurement teams love a leaderboard. It is tidy, numeric, comparable, and therefore dangerously comforting. A model scores well on MMLU, looks respectable on GSM8K, passes a coding benchmark, and suddenly someone in a meeting says it is “agent-ready.” Lovely. By that logic, a person who passes a written driving test should be handed the keys to a forklift in a crowded warehouse. ...

November 2, 2025 · 15 min · Zelina
Cover image

When Agents Learn to Test Themselves: TDFlow and the Future of Software Engineering

A bug report is not a specification A bug report says something is wrong. A test says exactly how wrong must fail. That difference is the centre of TDFlow, a test-driven agentic workflow for repository-scale software repair.1 The paper’s central move is not to make the coding agent more charismatic, more autonomous, or more burdened with inspirational tool access. Mercifully. It does almost the opposite: it narrows the agent’s world until the task becomes executable. ...

November 2, 2025 · 15 min · Zelina
Cover image

Agents That Build Agents: The ALITA-G Revolution

A good employee does not only finish the task. A good employee leaves behind a better way to do it next time. Most enterprise AI agents do not. They solve a ticket, answer a question, call a tool, browse a page, generate a report, and then politely forget the operational trick that made the task work. The transcript may be logged. The result may be saved. But the capability itself usually evaporates into the great corporate compost heap of “learnings”. Very nourishing. Not especially executable. ...

November 1, 2025 · 15 min · Zelina
Cover image

Agents, Automata, and the Memory of Thought

A booking agent is not dangerous because it can “reason.” It is dangerous because it can remember the wrong thing, forget the right thing, loop politely forever, or book the flight before the human has actually confirmed. The philosophy department may enjoy debating whether this counts as intention. The operations team has a simpler question: can we know, before deployment, what behaviours this system can produce? ...

November 1, 2025 · 15 min · Zelina
Cover image

The Benchmark Awakens: AstaBench and the New Standard for Agentic Science

Procurement meetings have a habit of turning AI agents into theatre. A vendor shows a polished research assistant. It finds papers, writes a summary, cites sources, maybe generates a small experiment plan. Everyone nods. Someone says “agentic workflow.” Someone else says “autonomous discovery.” A budget appears. The machine is declared practically scientific, which is convenient, because the machine itself has not yet been asked to survive the boring parts of science: retrieval under controlled conditions, code execution, data analysis, experimental reproduction, hypothesis testing, and the small matter of completing all required steps without wandering into the digital bushes. ...

October 31, 2025 · 13 min · Zelina

From Field Notes to Farm Operating Intelligence

A high-value commercial farm redesigned daily crop, irrigation, pest, harvest, labor, and buyer-delivery coordination around a reviewed AI operations brief instead of fragmented messages and manager memory.

October 30, 2025 · 8 min · Vox
Cover image

Beyond Utility: When LLM Agents Start Dreaming Their Own Tasks

A task list is usually where enterprise automation becomes reassuringly boring. Someone defines the work. The system executes it. A dashboard turns green, or, in more honest organisations, amber with an explanation. The point is not mystery. The point is control. The paper behind this article, LLM Agents Beyond Utility: An Open-Ended Perspective, asks what happens when that tidy arrangement is disturbed: what if the agent does not merely complete tasks, but proposes them? What if it can remember what it has done, inspect its environment, write notes to itself, and continue across runs?1 ...

October 23, 2025 · 15 min · Zelina
Cover image

Blueprints of Agency: Compositional Machines and the New Architecture of Intelligence

A prototype begins innocently enough: a product team wants a small machine, a vehicle, a tool, a fixture, perhaps a mechanism that throws something across a room because medieval engineering apparently never left the group chat. The modern AI pitch says the agent can design it. Give it parts, constraints, and a goal; let it reason; let it test; let it improve. ...

October 23, 2025 · 14 min · Zelina
Cover image

When Lateral Beats Linear: How LToT Rethinks the Tree of Thought

Budget is easy to approve when the system still fails anyway. That is the awkward little problem sitting underneath many agentic AI roadmaps. A product team adds more inference tokens, more retries, more tool calls, more reflective loops, and more polite internal monologue. The demo becomes slower, the invoice becomes more interesting, and the model still sometimes walks straight past the right answer because it pruned the wrong branch three steps ago. Progress, apparently. ...

October 21, 2025 · 13 min · Zelina