Cover image

Agents That Build Agents: The ALITA-G Revolution

A good employee does not only finish the task. A good employee leaves behind a better way to do it next time. Most enterprise AI agents do not. They solve a ticket, answer a question, call a tool, browse a page, generate a report, and then politely forget the operational trick that made the task work. The transcript may be logged. The result may be saved. But the capability itself usually evaporates into the great corporate compost heap of “learnings”. Very nourishing. Not especially executable. ...

November 1, 2025 · 15 min · Zelina
Cover image

Provenance, Not Prompts: How LLM Agents Turn Workflow Exhaust into Real-Time Intelligence

Logs are where teams go after the dashboard has already failed. A pipeline stalls. A model run produces nonsense. A compute job quietly burns budget on the wrong node. Someone opens three dashboards, two notebooks, and one ancient SQL snippet named final_debug_v3_really_final.sql. Then the archaeology begins. The paper LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology proposes a more interesting answer: do not ask an LLM to “understand the workflow” in the abstract. Give it live provenance metadata, a compact schema, query guidelines, and tools that execute structured queries on its behalf.1 In other words, stop treating the model as a psychic dashboard. Treat it as a controlled interface to workflow exhaust. ...

October 1, 2025 · 17 min · Zelina
Cover image

Tool Wars, Protocol Peace: What MCP‑AgentBench Really Measures

A procurement team does not buy an AI agent because it can recite the word “interoperability” with theatrical confidence. It buys the agent because the thing can use tools, collect data, combine results, and stop before it bankrupts the token budget. That is the useful way to read MCP-AgentBench, a new benchmark for evaluating language agents inside the Model Context Protocol ecosystem.1 The paper is not just another leaderboard with a fresh coat of protocol paint. Its more interesting result is harsher: MCP gives agents a common integration layer, but it does not make them competent tool users. Compatibility is plumbing. Competence is orchestration. ...

September 19, 2025 · 14 min · Zelina
Cover image

Tool Time, Any Time: Inside RLFactory’s Plug‑and‑Play RL for Multi‑Turn Tool Use

Tool calls are where agent demos stop being cute. A chatbot can talk through a task all day. A working agent has to search, query, execute, verify, retry, and sometimes discover that the tool it politely called has returned a malformed answer after making everyone wait. That is the difference between “reasoning about work” and doing work. The former gives you fluent paragraphs. The latter gives you latency, interface contracts, timeout handling, reward ambiguity, and a suspicious number of JSON parsing errors. Glamorous, naturally. ...

September 13, 2025 · 16 min · Zelina
Cover image

From PDF to PI: Turning Papers into Productive Agents

Every R&D team has a shelf of papers that are theoretically useful and practically booby-trapped. The abstract is promising. The method is relevant. The results look transferable. Then reality arrives wearing a conda error message: the repository has three setup paths, two notebooks, one undocumented dependency, and a tutorial that assumes you already know the answer. The paper has been published. The method has not, in any serious operational sense, been delivered. ...

September 12, 2025 · 17 min · Zelina
Cover image

Control Plane, Not Pain: How Agentic OS Turns Linux Scheduling into a Semantic Service

A scheduler is where elegant software abstractions go to meet the unpleasant fact that CPUs are finite. Most businesses do not care which runnable task receives a slice of time first. They care that builds finish faster, services stop coughing at the 99th percentile, batch jobs do not drag the whole estate into a swamp, and nobody has to summon a kernel engineer every time a workload changes shape. ...

September 4, 2025 · 14 min · Zelina
Cover image

ReAct Without the Chaos: AgentScope 1.0 Turns Tools into Strategy

TL;DR for operators AgentScope 1.0 is best read as a production-shaping framework for agentic applications, not as a victory lap over rival agent frameworks. Alibaba’s paper describes a developer-centric stack that rebuilds agents around four core abstractions — message, model, memory, and tool — then places a ReAct-style reasoning-and-action loop on top of them.1 ...

August 25, 2025 · 17 min · Zelina
Cover image

USB‑C for Agents, Stress‑Tested: What MCP‑Universe Really Reveals

TL;DR for operators MCP-Universe is useful because it punctures a very convenient belief: once an LLM is connected to tools through MCP, the agent is basically “integrated” and therefore close to production-ready. The paper says: adorable, but no.1 The benchmark tests agents against real MCP servers rather than toy APIs. It covers 231 tasks across Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching. It uses 11 MCP servers, 133 tools, and 84 execution-based evaluators, including dynamic evaluators that retrieve live ground truth for time-sensitive tasks. ...

August 23, 2025 · 18 min · Zelina
Cover image

Agents on the Wire: Protocols, Memory, and Guardrails for Real-World Agentic AI

TL;DR for operators An agent demo usually fails in production for boring reasons. Not because the model suddenly forgot how to reason. Because the agent cannot reliably discover another agent, remember the right state, expose a stable contract, validate risky outputs, or execute generated code without turning the server into an involuntary escape room. ...

August 18, 2025 · 17 min · Zelina
Cover image

Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats

TL;DR for operators A support agent reads a customer email. It checks a CRM record. It calls a refund API. It writes a note into long-term memory. It asks another agent to verify policy. Somewhere in that chain, a malicious instruction hides inside a message, document, issue tracker entry, retrieved snippet, schema, or tool response. The model does not need to become “evil”. It only needs to be helpful in the wrong direction. ...

July 1, 2025 · 16 min · Zelina