Cover image

Org Charts for Robots: What AgentArch Really Tells Us About Enterprise AI

If you’ve ever tried turning a clever chatbot into a reliable employee, you already know the pain: great demos, shaky delivery. AgentArch, a new enterprise-focused benchmark from ServiceNow, is the first study I’ve seen that tests combinations of agent design choices—single vs multi‑agent, ReAct vs function-calling, summary vs complete memory, and optional “thinking tools”—across two realistic workflows: a simple PTO process and a gnarly customer‑request router. The result is a cold shower for one‑size‑fits‑all playbooks—and a practical map for building systems that actually ship. ...

September 20, 2025 · 4 min · Zelina
Cover image

ReAct Without the Chaos: AgentScope 1.0 Turns Tools into Strategy

Thesis: AgentScope 1.0 is less a toolkit and more a discipline for agentic software. By pinning everything to ReAct loops, unifying “message–model–memory–tool,” and adding group-wise tool provisioning, it addresses the real failure mode of agents in production: tool sprawl without control. The evaluation/Studio/runtime trio then turns prototypes into shippable services. What’s actually new (and why it matters) 1) A crisp core: Message → Model → Memory → Tool Most frameworks blur these into ad‑hoc objects; AgentScope forces a clean, composable boundary: ...

August 25, 2025 · 4 min · Zelina
Cover image

Who Sees What, Who Pays the Cost? Teaching Agents to See Through Others’ Eyes

TL;DR A new study probes whether you can teach perspective‑taking to ReAct‑style LLM agents by feeding them structured examples distilled from a symbolic planner: optimal goal paths (G‑type), information‑seeking paths (E‑type), and local contrastive decisions (L‑type). The punchline: agents became decent at common‑ground filtering (what the other party can see) but remained brittle at imagining occluded space and pricing the cost of asking vs. exploring. In business terms, they’re good at “don’t recommend what the customer can’t see,” but still bad at “should I go find out more before I act—and is it worth it?” ...

August 23, 2025 · 5 min · Zelina
Cover image

The Memory Advantage: When AI Agents Learn from the Past

What if your AI agent could remember the last time it made a mistake—and plan better this time? From Reaction to Reflection: Why Memory Matters Most language model agents today operate like goldfish—brilliant at reasoning in the moment, but forgetful. Whether navigating virtual environments, answering complex questions, or composing multi-step strategies, they often repeat past mistakes simply because they lack a memory of past episodes. That’s where the paper “Agentic Episodic Control” by Zhihan Xiong et al. introduces a critical upgrade to today’s LLM agents: a modular episodic memory system inspired by human cognition. Instead of treating each prompt as a blank slate, this framework allows agents to recall, adapt, and refine prior reasoning paths—without retraining the underlying model. ...

June 3, 2025 · 3 min
Cover image

Plans Before Action: What XAgent Can Learn from Pre-Act's Cognitive Blueprint

If ReAct was a spark, Pre-Act is a blueprint. In the paper Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents, Mrinal Rawat et al. challenge the single-step cognitive paradigm of ReAct, offering instead a roadmap for how agents should plan, reason, and act—especially when tool use and workflow coherence matter. What Is ReAct? A Quick Primer The ReAct framework—short for Reasoning and Acting—is a prompting strategy that allows an LLM to alternate between thinking and doing in a loop. Each iteration follows this pattern: ...

May 18, 2025 · 4 min