Conducting the Agents: Why AORCHESTRA Treats Sub-Agents as Recipes, Not Roles

Opening — Why this matters now

Agentic systems are quietly hitting a ceiling.

As tasks stretch across longer horizons—debugging real codebases, navigating terminals, or stitching together multi-hop web reasoning—the dominant design patterns start to fray. Fixed workflows ossify. Multi-agent chats drown in coordination overhead. Context windows bloat, then rot.

AORCHESTRA enters this moment with a subtle but decisive shift: stop treating sub-agents as identities, and start treating them as configurations.

Background — From collaboration to orchestration

Early multi-agent systems borrowed heavily from human metaphors: roles, teams, conversations. Frameworks like planner–worker or role-based MAS improved decomposition, but at a cost—rigidity. Once the task drifted off the happy path, agents either over-shared context or missed critical information.

The more recent sub-agent-as-tools paradigm fixed part of this problem by isolating context. Yet most implementations collapsed into two extremes:

Context-isolated threads: good for avoiding context rot, bad at specialization.
Static specialist roles: capable, but brittle, incomplete, and expensive to engineer.

AORCHESTRA rejects both.

Analysis — Agents as four-tuples, not personas

The paper’s core abstraction is deceptively simple. Any agent—main or sub—is defined as:

[ \Phi = (Instruction, Context, Tools, Model) ]

This four-tuple reframes what an agent is:

Dimension	What it controls	Why it matters
Instruction	Objective & success criteria	Prevents vague or drifting goals
Context	Curated working memory	Avoids both starvation and overload
Tools	Action space	Explicit capability boundaries
Model	Cognitive engine	Enables cost–performance routing

Sub-agents are no longer predefined entities. They are instantiated on demand, purpose-built for a single subtask, then discarded.

Crucially, the orchestrator itself never executes environment actions. It only does two things:

Delegate(Φ) — spawn a tailored executor
Finish(y) — terminate with an answer

Execution and orchestration are cleanly separated.

Findings — Performance, but also control

Across three notoriously unforgiving benchmarks—GAIA, Terminal-Bench 2.0, and SWE-Bench-Verified—AORCHESTRA consistently outperforms established agent frameworks.

Headline result

When paired with Gemini-3-Flash, AORCHESTRA delivers a 16.28% average pass@1 improvement over the strongest baseline.

Why the gains are real

The paper’s ablations reveal where the lift comes from:

Context is curated, not inherited Passing all history hurts. Passing none fails. Selecting only task-relevant context wins decisively.
Orchestration is learnable Even a weaker 8B orchestrator beats single-agent ReAct. Fine-tuning improves decomposition quality further—at a predictable cost increase.
Cost-aware model routing works In-context optimization pushes the system onto a clear Pareto frontier: higher accuracy at lower average spend.
Sub-agents are truly plug-and-play Swap ReAct-style or SWE-style executors underneath the same orchestrator—the gains persist.

Implications — What this changes for builders

AORCHESTRA’s real contribution isn’t raw accuracy. It’s design discipline.

Agents become ephemeral executors, not long-lived conversational entities.
Capability selection becomes explicit and auditable.
Cost control is no longer an afterthought—it’s part of the policy.

For businesses deploying agentic systems, this architecture maps far better onto operational reality: budgets, logs, retries, failure analysis, and modular upgrades.

Conclusion — Conductors, not conversations

AORCHESTRA suggests a future where the most valuable intelligence in an agentic system isn’t inside the sub-agents at all—it’s in the orchestrator.

When agents are treated as recipes instead of roles, coordination stops being a liability and becomes a lever.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From collaboration to orchestration#

Analysis — Agents as four-tuples, not personas#

Findings — Performance, but also control#

Headline result#

Why the gains are real#

Implications — What this changes for builders#

Conclusion — Conductors, not conversations#