Multi‑agent LLMs work great on paper and go sideways in practice. We over‑select experts, flood the channel with verbose thoughts, and then pray a meta‑LLM can stitch it all together. OSC (Orchestrating Cognitive Synergy) proposes a missing middle: a learned orchestration layer that constantly models what each agent knows, spots “cognitive gaps,” and then tells agents how to talk—what to say, to whom, and at what level of detail—before the aggregator votes.

The big idea

Most frameworks optimize who speaks (dynamic expert selection) and how to combine outputs (aggregation). OSC optimizes the conversation itself. It introduces:

  • Collaborator Knowledge Models (CKM): Each agent maintains a latent state for every teammate (what they understand, confidence, assumptions), updated per message.
  • Learned cognitive gap analysis: A neural module compares “what I know” vs. “what you likely know” to identify actionable gaps that merit communication.
  • A communication policy (πcomm) trained with PPO: Instead of free‑form talk, agents sample a structured action—objective (clarify, propose, critique), targets (which teammates), and style (detail level, confidence tone). A generative LLM only verbalizes that action.

The result: fewer redundant words, faster consensus, and better final answers because the mid‑process is no longer a black box.

What changes in practice

  • Rounds and tokens drop without sacrificing depth. The system learns when to ask a pointed question vs. when to provide a worked example.
  • Conflict gets handled on purpose. OSC pushes agents to attack the right discrepancies early (method mismatch, timeline assumptions, objective misalignment), not just talk longer.
  • The aggregator gets higher‑quality ingredients. Each agent’s final response is the product of deliberate, gap‑closing exchanges rather than parallel monologues.

Numbers that matter

Below are the headline deltas reported by OSC versus strong multi‑agent baselines:

Metric OSC Next best (TalkHier) Notes
AlpacaEval 2.0 LC win rate 81.4% 77.9% (KABB) Highest among compared multi‑agent setups
MT‑Bench (avg) 9.94 9.65 (KABB) Strong on both first and second turns
Avg rounds 4.6 4.9 Fewer back‑and‑forth cycles
Avg tokens 3.31k 3.52k Less chatter, more signal
Redundancy 14.2% 15.3% Tighter messaging
Conflict resolution 89.5% 85.8% Gaps closed earlier
Info density 84.5% 81.9% More task‑relevant content per token

Scalability sweet spot. With 6 agents, OSC peaks (81.4% LC win rate) before coordination overhead creeps in with 8–10 agents. It’s a nice reminder that “more agents” isn’t a monotonic path to quality.

How OSC works (in one loop)

  1. Initialize CKM for each teammate.

  2. For each round (≈3–5):

    • Update CKMs from new messages.
    • Compute cognitive gaps between my internal plan and your inferred plan.
    • Sample a structured action via πcomm (objective, targets, style).
    • Let a base LLM verbalize that action concisely.
  3. After N rounds: Each agent drafts a refined answer → aggregator synthesizes → reward propagates back to CKM, gap analysis, and πcomm.

Why this is more than “better prompts”

Prompting gets you format discipline; OSC gives you interaction discipline. By learning which gaps matter and how to close them, the system reduces token spend and raises accuracy. It’s the difference between “debate and hope” and “triage then treat.”

Design critiques & open questions

  • Reward shaping dependence. Intrinsic rewards (e.g., measured gap reduction) help, but they add design surface. Can we learn reliable proxies from downstream consistency checks instead?
  • CKM model drift at scale. With 8–10 agents, CKM updates lag and misjudge states. A sparse attention or teammate salience prior could keep modeling cost sublinear in team size.
  • Policy leakage to style. If πcomm overfits to surface tone cues, it may miss deep plan inconsistencies. Regularize via plan graphs or typed intents, not just text embeddings.

Build notes for practitioners

  • Start with 4–6 complementary experts. Beyond that, enforce speaker budgets and gap thresholds to keep CKM stable.
  • Make actions first‑class. Represent objectives/targets/styles explicitly in logs—this simplifies analytics and failure triage.
  • Instrument “gap closure.” Define lightweight probes (consistency on subgoals, confidence convergence) to replace manual conversation audits.
  • Price–performance knobs. Limit rounds, cap tokens per message, and down‑weight low‑value objectives (e.g., unfocused critiques) in the reward.

Where this fits in the Cognaptus stack

For enterprise workflows (procurement analysis, policy reviews, tech due diligence), OSC‑style orchestration is the layer that turns “committee‑style LLMs” into compound intelligence: fewer meetings between agents, more decisions per minute. We’d integrate it between agent routing and report synthesis, exposing structured actions and gap metrics to dashboards for QA.


Cognaptus: Automate the Present, Incubate the Future