Opening — Why this matters now
Agentic systems are quietly hitting a ceiling.
As tasks stretch across longer horizons—debugging real codebases, navigating terminals, or stitching together multi-hop web reasoning—the dominant design patterns start to fray. Fixed workflows ossify. Multi-agent chats drown in coordination overhead. Context windows bloat, then rot.
AORCHESTRA enters this moment with a subtle but decisive shift: stop treating sub-agents as identities, and start treating them as configurations.
Background — From collaboration to orchestration
Early multi-agent systems borrowed heavily from human metaphors: roles, teams, conversations. Frameworks like planner–worker or role-based MAS improved decomposition, but at a cost—rigidity. Once the task drifted off the happy path, agents either over-shared context or missed critical information.
The more recent sub-agent-as-tools paradigm fixed part of this problem by isolating context. Yet most implementations collapsed into two extremes:
- Context-isolated threads: good for avoiding context rot, bad at specialization.
- Static specialist roles: capable, but brittle, incomplete, and expensive to engineer.
AORCHESTRA rejects both.
Analysis — Agents as four-tuples, not personas
The paper’s core abstraction is deceptively simple. Any agent—main or sub—is defined as:
[ \Phi = (Instruction, Context, Tools, Model) ]
This four-tuple reframes what an agent is:
| Dimension | What it controls | Why it matters |
|---|---|---|
| Instruction | Objective & success criteria | Prevents vague or drifting goals |
| Context | Curated working memory | Avoids both starvation and overload |
| Tools | Action space | Explicit capability boundaries |
| Model | Cognitive engine | Enables cost–performance routing |
Sub-agents are no longer predefined entities. They are instantiated on demand, purpose-built for a single subtask, then discarded.
Crucially, the orchestrator itself never executes environment actions. It only does two things:
Delegate(Φ)— spawn a tailored executorFinish(y)— terminate with an answer
Execution and orchestration are cleanly separated.
Findings — Performance, but also control
Across three notoriously unforgiving benchmarks—GAIA, Terminal-Bench 2.0, and SWE-Bench-Verified—AORCHESTRA consistently outperforms established agent frameworks.
Headline result
When paired with Gemini-3-Flash, AORCHESTRA delivers a 16.28% average pass@1 improvement over the strongest baseline.
Why the gains are real
The paper’s ablations reveal where the lift comes from:
-
Context is curated, not inherited Passing all history hurts. Passing none fails. Selecting only task-relevant context wins decisively.
-
Orchestration is learnable Even a weaker 8B orchestrator beats single-agent ReAct. Fine-tuning improves decomposition quality further—at a predictable cost increase.
-
Cost-aware model routing works In-context optimization pushes the system onto a clear Pareto frontier: higher accuracy at lower average spend.
-
Sub-agents are truly plug-and-play Swap ReAct-style or SWE-style executors underneath the same orchestrator—the gains persist.
Implications — What this changes for builders
AORCHESTRA’s real contribution isn’t raw accuracy. It’s design discipline.
- Agents become ephemeral executors, not long-lived conversational entities.
- Capability selection becomes explicit and auditable.
- Cost control is no longer an afterthought—it’s part of the policy.
For businesses deploying agentic systems, this architecture maps far better onto operational reality: budgets, logs, retries, failure analysis, and modular upgrades.
Conclusion — Conductors, not conversations
AORCHESTRA suggests a future where the most valuable intelligence in an agentic system isn’t inside the sub-agents at all—it’s in the orchestrator.
When agents are treated as recipes instead of roles, coordination stops being a liability and becomes a lever.
Cognaptus: Automate the Present, Incubate the Future.