When Agents Stop Talking to the Wrong People

Opening — Why this matters now

Multi-agent LLM systems are no longer a novelty. They debate, plan, critique, simulate markets, and increasingly make decisions that look uncomfortably close to judgment. Yet as these systems scale, something quietly fragile sits underneath them: who talks to whom, and when.

Most multi-agent frameworks still assume that communication is cheap, static, and benign. In practice, it is none of those. Agents drift, hallucinate, fatigue, or—worse—become adversarial while sounding perfectly reasonable. When that happens, fixed communication graphs turn from coordination tools into liability multipliers.

This paper introduces TodyComm, a framework that treats communication itself as a first-class decision problem—learned, adaptive, and optimized directly for task success.

Background — The hidden rigidity in multi-agent systems

The last two years have seen an explosion of LLM-based multi-agent systems: debate-style reasoning, role-based collaboration, graph-structured agent swarms. Nearly all of them share a common assumption: the communication topology is fixed at inference time.

Some methods prune agents or edges during training. Others learn edge weights over predefined graphs. A few attempt adversary detection. But structurally, most systems still behave as if the social network of agents were frozen.

That assumption breaks down in three very real scenarios:

Task progression — different stages require different information flows.
Bandwidth constraints — communication is not free; tokens cost money and latency.
Dynamic unreliability — agents can become misleading mid-conversation without changing roles or tone.

Static graphs cannot react. At best, they degrade gracefully. At worst, they amplify confident nonsense.

Analysis — What TodyComm actually changes

TodyComm reframes multi-round multi-agent collaboration as a Markov Decision Process, where the actions are not words, but communication structures.

At each round, the system decides:

Which agents are allowed to participate
How information flows between them
Who gets a voice in the final decision

The key mechanism is behavior-driven credit assignment.

Credit as a control signal

Each agent receives a per-round credit score reflecting its inferred reliability and contribution, learned from:

Its own answers and analysis
How consistent it remains over time
How its outputs align or conflict with neighbors

A gated recurrent network accumulates this behavioral history across rounds. Credits are not labels; they are beliefs, updated online.

From credits to graphs

Instead of directly learning graphs (a combinatorial nightmare), TodyComm:

Samples agent participation based on credit scores
Constructs directed acyclic communication graphs by prioritizing high-credit agents
Enforces optional in-degree and out-degree budgets

The result is a round-adaptive communication topology that can both densify among reliable agents and quietly isolate bad ones.

Decision-making is not exempt

After the final interaction round, TodyComm performs one more credit update and builds a decision graph. Only selected agents vote, and their influence is weighted by learned credibility—not by majority.

Communication and aggregation are optimized jointly using policy-gradient reinforcement learning, with task utility as the only reward.

Findings — What the experiments show

Across five benchmarks (commonsense, math, science, and medical QA), TodyComm was evaluated under dynamically adversarial settings where agents could become misleading mid-task.

Robustness under pressure

When adversarial agents exceeded 50% of the pool, most baselines collapsed. TodyComm did not.

Attack Regime	Avg Accuracy Gain vs Best Baseline
< 50%	Comparable or better
= 50%	Large improvement
> 50%	Decisive advantage

The system learned to stop listening before being persuaded.

Token efficiency

Despite dynamic adaptation, TodyComm remained competitive on token usage—often matching or outperforming pruning-based baselines. Under degree budgets, it reduced tokens and improved accuracy, suggesting that less communication can be better communication.

Adversary detection as a byproduct

Without explicit labels, participation decisions implicitly identified unreliable agents with over 85% accuracy on average. Detection emerged naturally from task optimization, not from separate supervision.

Implications — Why this matters beyond benchmarks

TodyComm points to a broader shift in agentic AI design:

Coordination beats cognition once base models are strong enough
Dynamic trust matters more than static roles
Silencing an agent can be as valuable as improving one

For business systems, this reframes risk. Failures may not come from bad models, but from letting the wrong internal voice dominate at the wrong time.

For governance and safety, it suggests an alternative to brittle filtering: learn who to listen to by consequences, not by rules.

For system builders, it hints that future agent frameworks will look less like chat rooms and more like adaptive organizations.

Conclusion — Communication is the real policy

TodyComm does not make agents smarter. It makes them more selective.

By treating communication as a learned, task-driven control problem, it exposes a quiet truth about multi-agent systems: intelligence does not fail first—coordination does.

As agent swarms grow larger and more autonomous, the ability to dynamically reshape who influences whom may prove more important than the next model upgrade.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The hidden rigidity in multi-agent systems#

Analysis — What TodyComm actually changes#

Credit as a control signal#

From credits to graphs#

Decision-making is not exempt#

Findings — What the experiments show#

Robustness under pressure#

Token efficiency#

Adversary detection as a byproduct#

Implications — Why this matters beyond benchmarks#

Conclusion — Communication is the real policy#