Agentic AI

From DAGs to Swarms: The Quiet Revolution of Agentic Workflows

Queue. That is still the hidden operating model of much modern science. Queue for the instrument. Queue for the simulation. Queue for the data transfer. Queue for a human to inspect the result, change the parameters, approve the next run, and remind three systems with incompatible interfaces that they are supposed to be part of the same experiment. The glamour version is “AI for discovery.” The operational version is a researcher quietly becoming a logistics coordinator with a PhD. ...

Sandboxes & Ladders: How to Build a Steerable Agent Economy

Budgets are where autonomy becomes real. A chatbot can be annoying. An agent with a procurement account, API access, calendar authority, cloud credits, and a habit of negotiating with other agents is something else entirely. At that point, we are no longer discussing “workflow automation” in the tidy enterprise sense. We are discussing economic actors: software systems that request resources, trade off priorities, outsource tasks, pay for services, and generate consequences faster than the compliance department can ask for a meeting. ...

From Branch Chats to an AI Operating Loop: Restaurant Operations Agents for a Multi-Branch Food Business

A multi-branch restaurant group used specialized AI agents to convert scattered branch data into reviewed demand, inventory, staffing, menu, and customer-service decisions.

Repo, Meet Your Agent: Turning GitHub into a Workforce with EnvX

Repositories are where useful software goes to become someone else’s setup problem. Every company has lived some version of this. A team finds a promising GitHub repository. The README looks confident. The demo works on the author’s laptop, naturally. Then the actual work begins: dependency pinning, missing model weights, obscure data formats, broken examples, undocumented entry points, and the strange ritual of reading three GitHub issues from 2022 to discover the one command that still works. ...

Tool Time, Any Time: Inside RLFactory’s Plug‑and‑Play RL for Multi‑Turn Tool Use

Tool calls are where agent demos stop being cute. A chatbot can talk through a task all day. A working agent has to search, query, execute, verify, retry, and sometimes discover that the tool it politely called has returned a malformed answer after making everyone wait. That is the difference between “reasoning about work” and doing work. The former gives you fluent paragraphs. The latter gives you latency, interface contracts, timeout handling, reward ambiguity, and a suspicious number of JSON parsing errors. Glamorous, naturally. ...

Fault Lines & Safety Nets: How RAFFLES Finds the First Domino in Agent Failures

A failed agent run rarely fails politely. It does not raise its hand at step 4 and say, “Here is the causal error; please patch the planner.” It drifts. A web agent grabs the wrong source. A coding agent trusts a bad assumption. A verifier rubber-stamps a plausible-looking answer. Twenty steps later the final output is wrong, the dashboard says “failed,” and the team is left doing digital archaeology with a very expensive shovel. ...

Graph and Circumstance: Maestro Conducts Reliable AI Agents

A broken AI agent often looks deceptively close to working. It answers most questions. It calls the right tool sometimes. It follows the instruction until the conversation gets long, the retrieval query gets vague, or the arithmetic becomes just difficult enough for the model to start doing spreadsheet theatre. The usual repair is prompt editing. Add a stern sentence. Add a role. Add an example. Add “think step by step,” because apparently the machine needed a motivational poster. ...

Plan, Then Rewrite: Why Explicit Intent Wins in Agent Workflows

A user starts by asking for Italian restaurants, answers a few clarification questions, then changes their mind and asks for Mexican instead. A human hears the reversal. A planner may hear: pizza, pasta, Italian, Mexican, recommendations, and perhaps a vague invitation to overachieve. Naturally, it may then produce a plan with the confidence of a consultant who attended only half the meeting. ...

Pieces, Not Puzzles: How ArcMemo Turns LLM Reasoning into Reusable Skills

Tickets repeat. Spreadsheets repeat. Compliance reviews repeat. Code reviews repeat. Not exactly, of course. That would be merciful. They repeat with just enough variation to make last month’s solution almost useful and therefore mildly dangerous. This is where many enterprise “AI memory” systems become filing cabinets with delusions of competence. They store prior chats, snippets, tickets, documents, and summaries, then hope the next prompt will rhyme closely enough with something in the archive. Sometimes it does. Often it does not. The agent remembers the old puzzle, not the transferable piece. ...

Plan, Don't Spam: The Goldilocks Rule for Test‑Time Compute

A busy agent is not necessarily a thinking agent. Anyone who has watched an LLM agent narrate every tiny move knows the feeling. It reviews the goal. It drafts a plan. It revises the plan. It reconsiders the revision. Then, with exquisite deliberation, it clicks the wrong button. The transcript looks intelligent; the behaviour looks like a consultant trapped in a revolving door. ...