In the fast-evolving landscape of agentic AI, one critical limitation persists: most frameworks can think or act, but rarely both in a fluid, self-directed manner. They follow rigid ReAct-like loops—plan, call, observe—resembling a robot that obeys instructions without ever truly reflecting on its strategy. The recent paper “DeepAgent: A General Reasoning Agent with Scalable Toolsets” from Renmin University and Xiaohongshu proposes an ambitious leap beyond this boundary. It envisions an agent that thinks deeply, acts freely, and remembers wisely.
From Structured Workflows to Unified Reasoning
Traditional agents like ReAct or Plan-and-Solve execute reasoning in iterative, pre-scripted steps. DeepAgent replaces this brittle choreography with a continuous, unified reasoning stream. Instead of separating thinking and acting, the model dynamically discovers, selects, and invokes tools within the same thought process. In effect, DeepAgent eliminates the need for a human-designed workflow — the agent itself decides when to think, when to act, and what to remember.
This is more than convenience. It’s an architectural reimagination: a shift from rule-following to strategy-forming. By integrating tool retrieval into the reasoning flow, DeepAgent behaves less like a prompt-bound chatbot and more like an adaptive researcher exploring an unfamiliar domain.
| Paradigm | Workflow Style | Tool Handling | Memory Management | Example Systems |
|---|---|---|---|---|
| ReAct / Plan-and-Solve | Predefined loops | Fixed toolset | Linear log | GPT-4 ReAct, CodeAct |
| Deep Research Agents | Limited autonomy | Restricted to few tools (e.g. web, code) | Partial summarization | Search-o1, DeepResearcher |
| DeepAgent | Fully autonomous | Dynamic tool retrieval and use | Brain-inspired folding (episodic, working, tool) | DeepAgent (this paper) |
Memory Folding: The Agent’s Cognitive Breathing
One of DeepAgent’s most elegant innovations is its Autonomous Memory Folding mechanism — a process akin to how humans consolidate experience. When the agent hits a dead-end or finishes a subtask, it doesn’t simply stack more tokens into an ever-growing context. Instead, it compresses its past thoughts into structured, schema-based memories:
- Episodic Memory: summaries of key events and decisions.
- Working Memory: immediate goals and challenges.
- Tool Memory: history of tool usage, parameters, and success rates.
These folds allow the agent to “take a breath,” clearing cognitive clutter while retaining strategic awareness. In long-horizon environments like GAIA or ALFWorld, this design reduces error accumulation and makes reflection computationally viable. The agent essentially remembers like a brain, not like a buffer.
Teaching Tool Mastery via Reinforcement Learning
To make the agent learn these complex patterns of discovery and memory, the authors introduce Tool Policy Optimization (ToolPO)—an end-to-end reinforcement learning strategy. Rather than relying on pre-scripted demonstrations, DeepAgent trains using LLM-simulated APIs, drastically reducing the instability and cost of real-world API calls.
Crucially, ToolPO assigns fine-grained rewards not just for task success, but for how the agent invokes tools. This credit attribution—what the authors call tool-call advantage—ensures the model doesn’t merely guess actions that happen to work, but learns which specific invocations yield meaningful progress.
The results speak volumes: across eight major benchmarks including ToolBench, WebShop, GAIA, and Humanity’s Last Exam, DeepAgent consistently outperforms all 32B baselines, even approaching GPT-4o and DeepSeek-R1 levels on some tasks. More tellingly, its robustness holds even in open-set scenarios where the agent must retrieve tools it has never seen before.
The Broader Implication: Toward Autonomous Cognition
The intellectual significance of DeepAgent lies less in its individual modules and more in its cognitive completeness. By unifying reasoning, memory, and action, it moves toward what we might call agentic coherence—the ability to sustain thought, adapt behavior, and learn from experience without external scaffolding.
In practical terms, this framework hints at a future where research assistants, robotic controllers, and enterprise automation systems can operate on vast, changing tool ecosystems—no longer confined to a handful of preapproved APIs. The agent becomes a self-managing ecosystem of reasoning and exploration, capable of discovering new utilities and integrating them on demand.
As we transition from “large language” to “large reasoning” models, DeepAgent demonstrates that true intelligence is not only about scale or data, but about autonomy in decision, adaptability in execution, and memory in reflection.
Cognaptus: Automate the Present, Incubate the Future