Opening — Why this matters now

Agentic systems are quietly hitting a ceiling.

As tasks stretch across longer horizons—debugging real codebases, navigating terminals, or stitching together multi-hop web reasoning—the dominant design patterns start to fray. Fixed workflows ossify. Multi-agent chats drown in coordination overhead. Context windows bloat, then rot.

AORCHESTRA enters this moment with a subtle but decisive shift: stop treating sub-agents as identities, and start treating them as configurations.

Background — From collaboration to orchestration

Early multi-agent systems borrowed heavily from human metaphors: roles, teams, conversations. Frameworks like planner–worker or role-based MAS improved decomposition, but at a cost—rigidity. Once the task drifted off the happy path, agents either over-shared context or missed critical information.

The more recent sub-agent-as-tools paradigm fixed part of this problem by isolating context. Yet most implementations collapsed into two extremes:

  • Context-isolated threads: good for avoiding context rot, bad at specialization.
  • Static specialist roles: capable, but brittle, incomplete, and expensive to engineer.

AORCHESTRA rejects both.

Analysis — Agents as four-tuples, not personas

The paper’s core abstraction is deceptively simple. Any agent—main or sub—is defined as:

[ \Phi = (Instruction, Context, Tools, Model) ]

This four-tuple reframes what an agent is:

Dimension What it controls Why it matters
Instruction Objective & success criteria Prevents vague or drifting goals
Context Curated working memory Avoids both starvation and overload
Tools Action space Explicit capability boundaries
Model Cognitive engine Enables cost–performance routing

Sub-agents are no longer predefined entities. They are instantiated on demand, purpose-built for a single subtask, then discarded.

Crucially, the orchestrator itself never executes environment actions. It only does two things:

  • Delegate(Φ) — spawn a tailored executor
  • Finish(y) — terminate with an answer

Execution and orchestration are cleanly separated.

Findings — Performance, but also control

Across three notoriously unforgiving benchmarks—GAIA, Terminal-Bench 2.0, and SWE-Bench-Verified—AORCHESTRA consistently outperforms established agent frameworks.

Headline result

When paired with Gemini-3-Flash, AORCHESTRA delivers a 16.28% average pass@1 improvement over the strongest baseline.

Why the gains are real

The paper’s ablations reveal where the lift comes from:

  1. Context is curated, not inherited Passing all history hurts. Passing none fails. Selecting only task-relevant context wins decisively.

  2. Orchestration is learnable Even a weaker 8B orchestrator beats single-agent ReAct. Fine-tuning improves decomposition quality further—at a predictable cost increase.

  3. Cost-aware model routing works In-context optimization pushes the system onto a clear Pareto frontier: higher accuracy at lower average spend.

  4. Sub-agents are truly plug-and-play Swap ReAct-style or SWE-style executors underneath the same orchestrator—the gains persist.

Implications — What this changes for builders

AORCHESTRA’s real contribution isn’t raw accuracy. It’s design discipline.

  • Agents become ephemeral executors, not long-lived conversational entities.
  • Capability selection becomes explicit and auditable.
  • Cost control is no longer an afterthought—it’s part of the policy.

For businesses deploying agentic systems, this architecture maps far better onto operational reality: budgets, logs, retries, failure analysis, and modular upgrades.

Conclusion — Conductors, not conversations

AORCHESTRA suggests a future where the most valuable intelligence in an agentic system isn’t inside the sub-agents at all—it’s in the orchestrator.

When agents are treated as recipes instead of roles, coordination stops being a liability and becomes a lever.

Cognaptus: Automate the Present, Incubate the Future.