Opening — Why this matters now
Multi-agent LLM systems are no longer a novelty. They debate, plan, critique, simulate markets, and increasingly make decisions that look uncomfortably close to judgment. Yet as these systems scale, something quietly fragile sits underneath them: who talks to whom, and when.
Most multi-agent frameworks still assume that communication is cheap, static, and benign. In practice, it is none of those. Agents drift, hallucinate, fatigue, or—worse—become adversarial while sounding perfectly reasonable. When that happens, fixed communication graphs turn from coordination tools into liability multipliers.
This paper introduces TodyComm, a framework that treats communication itself as a first-class decision problem—learned, adaptive, and optimized directly for task success.
Background — The hidden rigidity in multi-agent systems
The last two years have seen an explosion of LLM-based multi-agent systems: debate-style reasoning, role-based collaboration, graph-structured agent swarms. Nearly all of them share a common assumption: the communication topology is fixed at inference time.
Some methods prune agents or edges during training. Others learn edge weights over predefined graphs. A few attempt adversary detection. But structurally, most systems still behave as if the social network of agents were frozen.
That assumption breaks down in three very real scenarios:
- Task progression — different stages require different information flows.
- Bandwidth constraints — communication is not free; tokens cost money and latency.
- Dynamic unreliability — agents can become misleading mid-conversation without changing roles or tone.
Static graphs cannot react. At best, they degrade gracefully. At worst, they amplify confident nonsense.
Analysis — What TodyComm actually changes
TodyComm reframes multi-round multi-agent collaboration as a Markov Decision Process, where the actions are not words, but communication structures.
At each round, the system decides:
- Which agents are allowed to participate
- How information flows between them
- Who gets a voice in the final decision
The key mechanism is behavior-driven credit assignment.
Credit as a control signal
Each agent receives a per-round credit score reflecting its inferred reliability and contribution, learned from:
- Its own answers and analysis
- How consistent it remains over time
- How its outputs align or conflict with neighbors
A gated recurrent network accumulates this behavioral history across rounds. Credits are not labels; they are beliefs, updated online.
From credits to graphs
Instead of directly learning graphs (a combinatorial nightmare), TodyComm:
- Samples agent participation based on credit scores
- Constructs directed acyclic communication graphs by prioritizing high-credit agents
- Enforces optional in-degree and out-degree budgets
The result is a round-adaptive communication topology that can both densify among reliable agents and quietly isolate bad ones.
Decision-making is not exempt
After the final interaction round, TodyComm performs one more credit update and builds a decision graph. Only selected agents vote, and their influence is weighted by learned credibility—not by majority.
Communication and aggregation are optimized jointly using policy-gradient reinforcement learning, with task utility as the only reward.
Findings — What the experiments show
Across five benchmarks (commonsense, math, science, and medical QA), TodyComm was evaluated under dynamically adversarial settings where agents could become misleading mid-task.
Robustness under pressure
When adversarial agents exceeded 50% of the pool, most baselines collapsed. TodyComm did not.
| Attack Regime | Avg Accuracy Gain vs Best Baseline |
|---|---|
| < 50% | Comparable or better |
| = 50% | Large improvement |
| > 50% | Decisive advantage |
The system learned to stop listening before being persuaded.
Token efficiency
Despite dynamic adaptation, TodyComm remained competitive on token usage—often matching or outperforming pruning-based baselines. Under degree budgets, it reduced tokens and improved accuracy, suggesting that less communication can be better communication.
Adversary detection as a byproduct
Without explicit labels, participation decisions implicitly identified unreliable agents with over 85% accuracy on average. Detection emerged naturally from task optimization, not from separate supervision.
Implications — Why this matters beyond benchmarks
TodyComm points to a broader shift in agentic AI design:
- Coordination beats cognition once base models are strong enough
- Dynamic trust matters more than static roles
- Silencing an agent can be as valuable as improving one
For business systems, this reframes risk. Failures may not come from bad models, but from letting the wrong internal voice dominate at the wrong time.
For governance and safety, it suggests an alternative to brittle filtering: learn who to listen to by consequences, not by rules.
For system builders, it hints that future agent frameworks will look less like chat rooms and more like adaptive organizations.
Conclusion — Communication is the real policy
TodyComm does not make agents smarter. It makes them more selective.
By treating communication as a learned, task-driven control problem, it exposes a quiet truth about multi-agent systems: intelligence does not fail first—coordination does.
As agent swarms grow larger and more autonomous, the ability to dynamically reshape who influences whom may prove more important than the next model upgrade.
Cognaptus: Automate the Present, Incubate the Future.