Trust No One, Train Together: Zero-Trust Federated Learning Grows Teeth

A factory can know exactly which machine submitted a model update and still train on a lie.

The device may possess a valid cryptographic identity. Its software may have booted from an approved configuration. Its network connection may be encrypted. None of that proves that the update it sends is harmless—or that the resulting intrusion-detection model will recognize an attack crafted specifically to deceive it.

This distinction is the useful core of Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems, which proposes a layered framework called ZTA-FL for collaborative industrial intrusion detection.¹ The paper combines device attestation, SHAP-based filtering of suspicious model updates, and on-device adversarial training inside an edge–fog–cloud architecture.

The headline results are strong. On the Edge-IIoTset benchmark, the system reports 97.8% clean-data accuracy, 93.2% accuracy under a 30% label-flipping attack, 89.3% robustness against a stronger FGSM evasion setting, and an 8.7% backdoor attack-success rate. It also reports a 34% reduction in communication volume through quantization.

But the more transferable lesson is not that combining three defenses produces a larger number. It is that the defenses answer three different security questions:

Is this participant genuinely the device it claims to be?
Is the model update behaving like an honest update?
Will the resulting model remain useful when attackers manipulate its inputs?

Calling all three questions “zero trust” is convenient. Treating them as the same question is how security architectures acquire impressive diagrams and disappointing incident reports.

Federated learning creates three separate trust problems

Federated learning allows multiple devices or sites to train a shared model without sending their raw local data to a central repository. For industrial operators, the appeal is straightforward: factories, gateways, sensors, and regional facilities can improve a shared intrusion-detection system while keeping sensitive network data local.

The arrangement also turns every participating device into a potential contributor to the model’s failure.

A compromised device can impersonate another participant or create additional fake identities. It can submit a poisoned update designed to corrupt the shared model. Even when every training participant behaves honestly, an attacker can later feed carefully manipulated traffic into the deployed model to evade detection.

These attacks occur at different points in the learning process and require different controls.

Security question	Representative threat	ZTA-FL control	What the control cannot establish
Who sent the update?	Impersonation, replay, Sybil participation	TPM-based attestation and trust records	Whether an authenticated device has been compromised after boot
What does the update do?	Label flipping, gradient manipulation, backdoor injection	SHAP-stability filtering and weighted aggregation	Whether future inputs will evade the final model
How does the model behave under hostile input?	FGSM and PGD evasion attacks	On-device adversarial training	Whether participating clients are honest

This division matters because one of the most tempting misconceptions about secure federated learning is that reliable authentication largely solves the trust problem.

The paper’s own ablation study shows otherwise.

Attestation proves identity, not innocence

Before each training round, an edge device in ZTA-FL creates an attestation token containing its identity hash, timestamp, Platform Configuration Register measurement, and a random nonce. The token is signed using a Trusted Platform Module key and encrypted for the fog node.

The fog node checks the signature, token freshness, platform measurement, and the device’s existing trust score. Devices that repeatedly fail checks or submit suspicious updates can be quarantined. Legitimate firmware changes can be pre-registered through signed manifests, reducing the chance that an approved update is mistaken for compromise.

This is a sensible control. It makes impersonation and replay attacks harder, gives the system a persistent record of device behavior, and provides a basis for excluding participants before they influence aggregation.

It is also almost irrelevant to the model’s robustness once a legitimately authenticated device begins submitting malicious updates.

The ablation results make the boundary unusually clear:

Configuration	Clean accuracy	Poisoned accuracy	Adversarial accuracy
Baseline federated learning	94.2%	67.8%	71.2%
Baseline plus attestation	94.3%	68.1%	71.4%
Baseline plus SHAP aggregation	97.1%	91.4%	72.8%
Baseline plus adversarial training	96.8%	72.3%	87.6%
Full ZTA-FL	97.8%	93.2%	89.3%

Attestation raises poisoned accuracy from 67.8% to 68.1%. It raises adversarial accuracy from 71.2% to 71.4%. Those are not the improvements of a defense that detects malicious learning behavior. They are the improvements of a defense performing a different job.

That does not make attestation unimportant. Without reliable identity, attackers can amplify their influence by adding fake participants, replaying credentials, or repeatedly rejoining after exclusion. Attestation limits access to the learning process and makes enforcement possible.

It is the security guard checking badges at the entrance. Necessary, certainly. But nobody should ask the guard to inspect every spreadsheet carried into the building.

The paper reports a false rejection rate of 0.003%, average emulated verification time of 4.2 milliseconds per agent, and complete detection of the tested impersonation and replay attacks. However, the experiments use a software TPM emulator rather than physical TPM hardware. The reported identity-control performance is therefore evidence for the protocol design and emulated implementation, not yet confirmation of deployment behavior across real industrial hardware.

SHAP stability asks whether an update changes the model’s behaviour

After attestation, the harder problem begins.

Industrial participants rarely possess identically distributed data. A programmable logic controller, an MQTT gateway, and a supervisory control system observe different traffic, protocols, attack patterns, and operating conditions. Even two similar factories may produce substantially different local datasets.

This heterogeneity makes conventional anomaly detection difficult. An update that looks different from the majority may be malicious. It may also be a perfectly honest contribution from a device with unusual but legitimate data.

Distance-based aggregation methods can confuse difference with dishonesty.

ZTA-FL’s central technical idea is to evaluate how each participant’s update changes the model’s feature attributions. At the fog layer, the system computes GradientSHAP values on shared validation data and compares an agent’s current feature-importance pattern with its previous pattern.

The comparison is primarily temporal rather than cross-sectional. Instead of asking, “Does this device resemble all the other devices?”, the system asks, “Has this device suddenly changed what its model considers important?”

That is a better question under non-IID data.

A gateway that consistently emphasizes one group of features may look unusual compared with other participants while remaining internally stable. A poisoned update can appear statistically plausible yet sharply alter which features drive the model’s decisions. SHAP stability is intended to expose that behavioural shift.

The aggregation mechanism filters agents whose stability scores fall below a threshold. Updates that pass are weighted using a combination of stability, validation accuracy, and local dataset size. The framework also maintains trust scores across rounds and can revert from a deteriorating aggregate model.

The paper’s theoretical argument formalizes a conditional version of this intuition: when a poisoned update is large enough to cause a detectable change in feature attribution relative to the variation among honest stability scores, the malicious update should fall below the filtering threshold with high probability.

That is useful theoretical support, but it should not be inflated into a universal guarantee. The mechanism works when effective poisoning creates a sufficiently visible attribution shift. An attacker capable of changing the model while keeping SHAP values stable attacks the premise directly.

The authors test such a SHAP-aware adaptive attack. Constraining the attacker to preserve SHAP stability reduces the achieved accuracy degradation from 32.1% to 12.3%. In other words, hiding the behavioural footprint makes the attack less effective, but not harmless.

The defence imposes a cost on the attacker. It does not abolish the attacker.

Why behavioural filtering performs most of the poisoning defence

Under a 30% compromised-client rate on Edge-IIoTset, ZTA-FL maintains 93.2% accuracy against label flipping and 91.7% against gradient manipulation.

The comparative results clarify the magnitude:

Method	Label-flipping accuracy at 30% compromised clients	Gradient-manipulation accuracy at 30% compromised clients	Backdoor attack-success rate
FedAvg	67.8%	71.3%	87.3%
Krum	82.4%	83.9%	45.2%
FLTrust	89.4%	88.2%	15.3%
FLAME	90.1%	89.7%	12.8%
ZTA-FL	93.2%	91.7%	8.7%

These are the paper’s main poisoning and backdoor comparisons, not incidental robustness checks. They support the claim that combining behavioural attribution analysis with conventional validation signals improves resilience under the tested non-IID attack settings.

The ablation study then explains why. Adding SHAP aggregation to the baseline increases poisoned accuracy from 67.8% to 91.4%. Adding attestation alone barely moves it. Adding adversarial training alone improves it to 72.3%.

Most of the poisoning protection therefore comes from examining submitted updates, not from authenticating their senders or hardening the final model against hostile inputs.

The SHAP visualization offers supporting diagnostic evidence. Honest agents have a reported mean stability score of 0.89, while malicious agents average 0.42, with the selected threshold separating the two populations in the presented experiment. This visualization helps explain the mechanism. It does not establish that every attack or industrial distribution will separate so neatly.

That distinction is worth preserving. A graph showing two cleanly divided populations is comforting. Attackers are under no obligation to remain visually cooperative.

Adversarial training protects the model after aggregation

Poisoning attacks corrupt the learning process. Evasion attacks target the trained model during use.

ZTA-FL addresses evasion by having edge devices generate adversarial examples locally using FGSM or PGD and include them during training. Each device uses a mixture of 70% clean data and 30% adversarial data. Because the adversarial examples are generated on-device, raw local records do not need to be centralized.

Again, the ablation study reveals the division of labour. Adversarial training raises adversarial accuracy from 71.2% to 87.6%, while SHAP aggregation alone raises it only to 72.8%. The full framework reaches 89.3%.

The broader adversarial evaluation reports:

Attack setting	FedAvg	Adversarial-FL	ZTA-FL
FGSM, $\epsilon = 0.1$	71.2%	79.6%	89.3%
PGD-7, $\epsilon = 0.1$	67.4%	75.2%	86.8%
PGD-20, $\epsilon = 0.1$	63.9%	71.8%	84.7%

These tests are main evidence for evasion robustness under the selected gradient-based attacks. They show that the combined system performs substantially better than ordinary federated learning and the paper’s adversarial-FL baseline.

They do not prove resistance to every adaptive evasion method, every perturbation budget, or attacks operating outside the benchmark’s feature representation. They also do not show that every edge device can afford the additional training burden.

Adversarial training is the most expensive of the three controls. That cost becomes important when the architecture moves from benchmark hardware to battery-powered sensors, older gateways, and devices already struggling with their assigned industrial duties.

A sensor that misses its operational deadline because it is busy hardening a neural network has achieved an unconventional form of security.

The full architecture works because its components fail differently

The paper describes the combined result as beneficial synergy. The more precise interpretation is operational complementarity.

Attestation limits who can participate and supports quarantine. SHAP aggregation limits the influence of suspicious updates. Adversarial training improves the final model’s resistance to manipulated inputs. No component reliably substitutes for either of the others.

This becomes clearer when the system is viewed as a sequence of gates:

Edge device
    |
    | 1. Can the participant prove identity, freshness, and platform state?
    v
Attestation gate
    |
    | 2. Does the submitted update preserve credible behavioural stability?
    v
SHAP-weighted fog aggregation
    |
    | 3. Has local training prepared the model for hostile inputs?
    v
Cloud coordination and redistributed global model

The hierarchical edge–fog–cloud design is not merely an architecture diagram arranged to reassure reviewers. It assigns each control to a practical location.

Edge devices retain data and perform local adversarial training. Fog nodes handle attestation checks and computationally heavier SHAP analysis close to the devices. The cloud coordinates global aggregation without individually processing every raw local update.

This distribution also creates opportunities for selective deployment. A resource-constrained sensor may rely on gateway-level attestation. A fog node can examine regional update behaviour. More capable edge devices can perform full adversarial training.

For businesses, that modularity is more useful than a binary decision between adopting the entire framework or doing nothing.

How to read the paper’s experimental evidence

The paper contains several types of tests. They answer different questions and should not be bundled into a single claim that the framework is simply “secure.”

Test	Likely purpose	What it supports	What it does not prove
Clean performance across Edge-IIoTset, CIC-IDS2017, and UNSW-NB15	Main performance evidence	The framework retains strong benchmark accuracy across three datasets	Real-world detection quality after deployment drift
Poisoning, gradient manipulation, and backdoor comparisons	Main robustness evidence and comparison with prior work	ZTA-FL outperforms tested baselines under specified attacks	Robustness against all adaptive or colluding attackers
Component ablation	Mechanism attribution	Different layers provide different forms of protection	That the same contribution split holds in every environment
SHAP-aware constrained attack	Robustness test against an adaptive adversary	Evading SHAP detection reduces attack effectiveness	That SHAP-aware attacks are fully neutralized
Slow-poisoning and collusion analysis	Failure characterization and exploratory extension	Identifies conditions where the defence weakens	A completed mitigation for those failures
TPM emulator and Raspberry Pi setup	Implementation detail	The protocol can be exercised in a reproducible test environment	Hardware-validated industrial deployment
Selective adversarial-training optimisation	Deployment sensitivity test	Compute overhead may be reduced with some robustness loss	That the reduced-cost setting suits every device class

The clean-data results are consistently strong: 97.8% accuracy on Edge-IIoTset, 96.4% on CIC-IDS2017, and 95.2% on UNSW-NB15. These results matter because security mechanisms that severely reduce ordinary performance are rarely deployed for long.

The poisoning and evasion experiments then show that robustness is not purchased by abandoning clean accuracy. However, the experiments remain benchmark evaluations using predefined preprocessing, attack families, and simulated client distributions.

The distinction is simple: the paper shows that the architecture deserves further engineering attention. It does not show that an operator can install it next quarter and treat the resulting system as a certified industrial security control.

Communication becomes cheaper while computation becomes much more expensive

ZTA-FL reports a 34% reduction in communication per round, from 48.5 MB for FedAvg to 32.1 MB, largely through 8-bit model quantization. In bandwidth-constrained industrial networks, that is commercially relevant.

The framework also converges faster in the reported experiments. On clean data, it reaches convergence after 42 rounds compared with 58 for FedAvg. Under attack, it converges after 48 rounds, compared with 67 for FLTrust and 87 for Krum.

Then the bill arrives.

A ZTA-FL round takes 28.2 seconds, a 97% increase over baseline federated learning. The paper attributes much of this overhead to adversarial training, which adds 6.9 seconds, and SHAP computation, which adds 3.1 seconds. Attestation adds only 0.4 seconds.

This cost profile again reinforces the mechanism-first interpretation. Cryptographic verification is not the main performance obstacle. Behavioural analysis and robust training are.

The authors report that selective adversarial training can reduce the overhead increase from 97% to 43%, with a 2.1 percentage-point reduction in robustness. That is a useful sensitivity result because it begins to expose the operational trade-off rather than pretending that every device should receive maximum hardening.

The paper’s scalability claims should nevertheless be treated cautiously. It reports more than 96% accuracy with 1,000 agents and a 127-second round time, while an earlier architectural description states that the protocol can complete a round for 10,000 devices in less than 30 seconds. These statements are difficult to reconcile without further implementation detail.

For deployment planning, the conservative interpretation is preferable: hierarchical aggregation and quantization appear promising for scale, but the reported throughput should be independently reproduced under the intended hardware, network topology, and security settings.

The business decision is where to place each control

The practical value of the paper is not a ready-made industrial product. It is a useful decomposition of security investment.

An operator considering collaborative intrusion detection can evaluate three controls separately:

Control	Operational decision	Potential value	Principal cost or boundary
Device and gateway attestation	Which participants require hardware-rooted identity and per-round verification?	Reduces impersonation, replay, and uncontrolled re-entry	Requires TPM, TrustZone, or gateway-level support; hardware tests remain pending
Behavioural update inspection	Which fog or regional nodes should compute attribution stability?	Detects suspicious model behaviour under heterogeneous data	Adds compute cost and may expose information through shared validation and SHAP outputs
Adversarial local training	Which devices and models require stronger evasion resistance?	Improves performance against hostile inputs	Substantial training overhead on constrained devices
Hierarchical aggregation and quantization	Where should updates be aggregated and compressed?	Reduces communication and localizes policy enforcement	Requires topology design and independently verified scale estimates

This framing changes the implementation question from “Should we deploy zero-trust federated learning?” to “Which trust failure is expensive enough to justify which control?”

A regional energy operator may prioritize attestation and fog-level update filtering because unauthorized or poisoned participants pose the largest systemic risk. A manufacturer with tightly managed gateways but exposure to evasive network traffic may place more value on selective adversarial training. A healthcare consortium may value update-behaviour analysis while being particularly concerned about SHAP-related privacy leakage.

Cognaptus infers that the highest-value near-term use is likely to be gateway- or fog-centred deployment rather than full hardening of every sensor. The paper’s own cost breakdown supports that interpretation: attestation is comparatively cheap, SHAP analysis can be centralized at fog nodes, and adversarial training can be applied selectively.

That is an inference from the architecture and reported costs, not a deployment result demonstrated by the paper.

Where the framework still breaks

The paper is unusually useful in documenting several failure modes rather than hiding them beneath an average accuracy figure.

Slow poisoning remains difficult to detect. An attacker making small changes over many rounds may preserve short-term SHAP stability. The authors report a 7.3% accuracy decline after 50 rounds in a slow-poisoning scenario. Cumulative attribution-drift tracking is proposed as a future mitigation, but the current method remains vulnerable.

Colluding attackers can reshape the apparent normal distribution. When the compromised fraction rises to 40%, the reported accuracy falls to 78.4%, compared with 93.2% at 30%. SHAP stability is less useful when enough malicious agents cooperate to redefine what stable behaviour looks like.

Extreme heterogeneity creates false positives. When each agent observes a completely separate class distribution, honest SHAP patterns become more variable, producing an 8.2% false-positive filtering rate. Adaptive thresholds may help, but they also risk creating more room for attackers.

The attestation results are not yet hardware-validated. The experiment uses an IBM software TPM emulator. Physical TPMs and ARM TrustZone deployments are planned for later work.

SHAP introduces a privacy question of its own. Computing attributions on shared validation data may reveal information about feature distributions. Differential privacy is suggested as a possible mitigation, but the paper does not formally evaluate the leakage or the resulting utility trade-off.

The benchmark setting is not an industrial deployment. The evaluation uses three established intrusion-detection datasets, simulated non-IID distributions, a defined set of attacks, and a controlled hardware environment. Operational networks introduce firmware drift, missing data, unstable connections, changing threats, maintenance interruptions, and human operators who occasionally solve urgent problems by disabling inconvenient controls.

These limitations do not erase the paper’s contribution. They define its proper use.

ZTA-FL should be read as an integrated defence architecture with promising benchmark evidence and unusually explicit failure characterization. It should not yet be read as proof that explainable aggregation and TPM attestation have solved secure industrial federated learning.

Zero trust is a division of labour

The phrase “zero trust” often sounds like a demand to distrust everything equally. In practice, useful zero-trust systems are more discriminating. They identify what must be verified, what evidence is relevant, and which control is responsible.

ZTA-FL’s most important result is therefore visible in its ablation table rather than its highest accuracy score.

Attestation verifies the participant but barely changes poisoning or evasion performance on its own. SHAP-based aggregation performs most of the defence against malicious updates. Adversarial training performs most of the defence against hostile inputs. The complete system works because each layer addresses a failure the others cannot reliably see.

For industrial operators, the lesson is pleasantly unsentimental:

Knowing who submitted an update is not the same as knowing the update is safe. Knowing the update is safe is not the same as knowing the final model is robust. Collaborative learning becomes defensible only when those questions are answered separately—and repeatedly.

Trust no one, certainly. But more importantly, verify the right thing.

Cognaptus: Automate the Present, Incubate the Future.

Samaresh Kumar Singh, Joyjit Roy, and Martin So, “Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems,” arXiv:2512.23809, https://arxiv.org/abs/2512.23809 ↩︎

Federated learning creates three separate trust problems#

Attestation proves identity, not innocence#

SHAP stability asks whether an update changes the model’s behaviour#

Why behavioural filtering performs most of the poisoning defence#

Adversarial training protects the model after aggregation#

The full architecture works because its components fail differently#

How to read the paper’s experimental evidence#

Communication becomes cheaper while computation becomes much more expensive#

The business decision is where to place each control#

Where the framework still breaks#

Zero trust is a division of labour#