Rooms are stubborn.

A model trained in a conference room may behave confidently in a hotel room, badly in a bus, and mysteriously in a classroom. The Wi-Fi signal does not merely reflect “how many people are present.” It reflects furniture, wall geometry, transmitter placement, receiver hardware, movement patterns, and every other physical nuisance that refuses to fit neatly into a spreadsheet.

This is why Wi-Fi CSI crowd counting is more interesting than it first sounds. Channel State Information, or CSI, records how Wi-Fi signals behave across subcarriers between a transmitter and receiver. When people enter a space, their bodies disturb the signal. With enough labeled data, a model can learn to estimate occupancy from those disturbances. That is attractive for offices, buses, classrooms, hotels, retail stores, and public facilities because it avoids cameras, does not require people to carry devices, and can potentially reuse wireless infrastructure.

The catch is boring, brutal, and therefore important: every site is different.

The paper FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting proposes a personalized federated learning method for this problem.1 Its central idea is not “use federated learning.” That would be the obvious summary, and also the less useful one. The sharper idea is this: in heterogeneous Wi-Fi sensing, collaboration should happen through compact class-level representations, and those representations should be aggregated selectively, not averaged like soup.

FedAPA’s contribution is best understood as a mechanism. It turns each client’s data into prototypes, weights peer prototypes by similarity, pads missing classes for label-skewed clients, and trains local models with a warm-up schedule that gradually introduces prototype contrastive learning. The result is a system where each site keeps its own model, but still borrows structured knowledge from similar sites.

That distinction matters. Federated learning is often sold as if privacy-preserving collaboration were the hard part and aggregation were an implementation detail. FedAPA reminds us that aggregation is the product.

The real enemy is not privacy alone; it is mismatched experience

In a centralized version of Wi-Fi crowd counting, every site could send raw CSI data to a cloud server. That creates obvious privacy, security, bandwidth, and ownership problems. The paper notes that CSI is high-dimensional and continuously generated, so uploading raw CSI streams can become a communication burden, not just a governance headache.

Federated learning seems like the cleaner answer: keep raw data local, train collaboratively, and share updates. But standard FL, especially FedAvg-style model averaging, assumes that combining client updates produces a useful shared model. In Wi-Fi sensing, that assumption can age poorly.

Consider three forms of heterogeneity in the paper’s setup:

Heterogeneity type What it means in Wi-Fi sensing Why naive FL struggles
Feature skew Different rooms and layouts produce different CSI patterns for the same occupancy count The same label can correspond to different signal geometry
Label skew Some clients may not observe all crowd-count labels A site may lack examples for certain occupancy levels
Model heterogeneity Devices may have different computational capacity and therefore different model architectures Full model averaging becomes awkward or impossible when models differ

This is the reader misconception worth killing early: federated learning does not automatically make sensing scalable. If the clients are too different, blind aggregation can cause negative transfer. A bus should not have to learn occupancy from a hotel room with equal enthusiasm. Very democratic, very wrong.

FedAPA addresses this by changing what gets shared.

A prototype is the smallest useful gossip a site can send

FedAPA does not exchange raw CSI, gradients, or full model parameters. Each client trains a local encoder and classifier. The encoder maps a CSI sample into an embedding space. For each class label — in this paper, the number of people present, ranging up to 20 — the client computes a prototype: the mean embedding for samples of that class.

A prototype is not a full dataset. It is not the model. It is a compact summary of what “class 7 people” or “class 12 people” looks like in that client’s representation space.

That small shift has several consequences.

First, prototypes are much cheaper to communicate than full CNN weights. Second, they can work across heterogeneous local architectures because the collaboration occurs in the representation space rather than through parameter averaging. Third, prototypes make similarity measurable: if two clients have similar class prototypes, they may be useful collaborators for that class.

The mechanism can be summarized like this:

Step FedAPA action Operational meaning
Local representation Each client maps CSI segments into embeddings Every site learns its own signal language
Class prototype Each client averages embeddings by crowd-count label Each site sends compact class summaries
Similarity weighting Server compares client prototypes using cosine similarity Similar sites receive more influence
Personalized return Each client receives its own aggregated prototype set There is no single universal “global” model
Local retraining Client uses classification plus prototype contrastive learning The model learns labels and a better representation space

This is the first business lesson. In distributed sensing, the valuable asset may not be the full model. It may be a lightweight representation of what each class looks like under local conditions.

That matters if the deployment target is not a research lab but a portfolio of messy buildings, rooms, vehicles, and devices. Nobody wants to ship a massive model update every few minutes just because one conference room got rearranged. Nobody sane, anyway.

Adaptive aggregation is the paper’s actual engine

The core FedAPA move is adaptive prototype aggregation.

Older prototype-based FL methods often average prototypes uniformly. If five clients have a prototype for class $c$, the server averages them and sends back one global prototype. That is easy. It is also suspiciously optimistic.

FedAPA instead computes pairwise cosine similarity between client prototypes for each class. The server then uses a softmax weighting scheme controlled by a temperature parameter. Similar clients receive higher aggregation weights; dissimilar clients matter less. Each client gets a personalized class prototype rather than the same global average.

So FedAPA is not saying “everyone contributes.” It is saying “everyone may contribute, but not equally, and not to everyone.”

That is the part worth lingering on.

In a multi-site business deployment, the question is not whether the system can train from many locations. The question is whether the system can identify which locations should learn from one another. A classroom at peak density may share useful structure with another classroom. A bus with moving passengers and different signal reflections may be less useful to a hotel room. Treating all environments as peers is not collaboration. It is a conference call with no agenda.

FedAPA’s aggregation mechanism turns cross-site learning into a similarity graph, even if the paper does not frame it in exactly those product terms. Each client’s collaboration neighborhood emerges from prototype similarity. That is the paper’s most transferable design idea.

Padding missing classes is not a footnote; it is how the method handles label skew

Real deployments do not give every site every label. A small office may rarely observe twenty people. A bus may see many high-density states. A hotel room may mostly see low occupancy. If prototype learning assumes every client has every class, the method becomes fragile in exactly the situations where it is supposed to help.

FedAPA includes a prototype padding mechanism for missing classes. When a client lacks a class, the server constructs a pseudo-prototype for that class by averaging the corresponding prototypes from other clients, weighted by sample counts. These padded prototypes help create a more complete class structure for contrastive learning.

This does not magically solve label scarcity. The authors explicitly discuss scarce labels as a limitation. But padding changes the local learning problem: a client without examples of a class can still receive a representation anchor for that class. In contrastive learning terms, those anchors can function as informative negatives and help organize the feature space.

For business interpretation, this is subtle but important. FedAPA is not only improving average benchmark performance. It is reducing the damage caused by uneven operational exposure. A site that has never seen a certain crowd level is not left completely blind. It receives a cautious representation-level hint from the federation.

That hint is not a substitute for real labels. But in deployment, hints are often cheaper than annotations.

The warm-up schedule prevents representation learning from arriving too early and ruining the party

FedAPA’s local objective combines cross-entropy classification with two prototype-based contrastive terms. One aligns local embeddings with the personalized global prototypes. The other uses the uploaded prototype sets from other clients to encourage inter-client representation learning.

The paper does not apply these representation losses at full strength from the beginning. It uses a warm-up coefficient that starts near zero and gradually increases, with the default warm-up length set to 50 rounds.

The intuition is practical. Early in training, the model should first learn basic classification boundaries. If prototype contrastive pressure dominates too early, the representation space may be organized around unstable or immature prototypes. After the classifier has learned enough structure, contrastive learning can reshape the embedding space more productively.

This is not merely a training trick. It is a governance principle for distributed learning systems: do not force alignment before clients know what they are aligning.

FedAPA’s sensitivity tests support this point. The authors compare different warm-up lengths and find that performance improves from 25 to 50 rounds but slightly declines at 100 rounds. Too short means contrastive learning arrives too aggressively. Too long means the model delays useful representation alignment. The paper also compares the warm-up coefficient with static coefficients and reports that static settings perform worse.

That test should be read as a sensitivity and ablation result, not as a universal law that “50 rounds is optimal.” It says the curriculum matters in this setup. It does not say every building, router, occupancy range, or training budget should copy the same schedule.

The main results show improvement, but the ablations explain why

The headline numbers are strong. FedAPA is evaluated on a real-world Wi-Fi CSI crowd-counting dataset across six environments, including living room, classroom, bus, conference room, and hotel room settings. The paper evaluates both statistical heterogeneity and model architecture heterogeneity. It reports Accuracy, F1, and mean absolute error, averaging results over the last five communication rounds.

The main comparison is against local training, FedAvg, WiFederated, and FedCaring.

Setting Method Accuracy F1 MAE ↓
Statistical data heterogeneity Local 72.06 71.32 0.69
Statistical data heterogeneity FedAvg 59.46 38.26 2.01
Statistical data heterogeneity WiFed 77.57 76.91 0.57
Statistical data heterogeneity FedCaring 77.60 69.18 0.62
Statistical data heterogeneity FedAPA 87.25 85.91 0.23
Model architecture heterogeneity Local 46.81 39.45 1.26
Model architecture heterogeneity FedAvg 46.16 37.51 2.03
Model architecture heterogeneity WiFed 54.94 47.80 1.34
Model architecture heterogeneity FedCaring 70.00 68.43 0.77
Model architecture heterogeneity FedAPA 80.31 78.94 0.48

The key interpretation is not just that FedAPA wins. It is where the baselines fail.

FedAvg performs poorly under both heterogeneous settings, especially in F1 and MAE. That reinforces the paper’s central diagnosis: simple model averaging is a bad fit when local distributions and device architectures diverge. WiFederated improves over FedAvg through fine-tuning. FedCaring improves model heterogeneity performance through weighted model aggregation. But FedAPA still does better, suggesting that prototype-level, similarity-aware collaboration captures something that coarse model-level weighting misses.

The ablation table is even more useful because it isolates the mechanism.

Variant Statistical Acc. Statistical F1 Statistical MAE ↓ Model Acc. Model F1 Model MAE ↓ Likely purpose
Global average prototype 74.02 73.59 0.71 54.75 47.89 1.62 Ablation against uniform prototype averaging
Global average prototype with client prototypes 82.03 81.47 0.38 69.41 67.20 0.77 Ablation testing the value of involving peer prototypes in local contrastive learning
FedAPA 87.25 85.91 0.23 80.31 78.94 0.48 Full method with similarity-aware adaptive aggregation

This is the better evidence for the article’s mechanism-first structure. If we only cite the main benchmark table, FedAPA looks like another “new method beats baselines” paper. Charming, but familiar. The ablation tells us why: uniform prototype averaging helps only so much; adding peer prototypes helps more; adaptive similarity-aware aggregation gives the full gain.

In other words, the paper is not merely validating prototypes. It is validating selective prototype collaboration.

The communication result is small in kilobytes and large in product meaning

FedAPA’s communication advantage is unusually clear. With the LargeConvNet4 model, the paper reports per-client communication cost per round as follows:

Method Communication per round per client
FedAvg 3,710 KB
FedCaring 3,710 KB
WiFederated 3,710 KB
FedAPA 150.53 KB

That is a reported 95.94% reduction in communication overhead.

This result should be treated as operational evidence, not merely a compression statistic. In edge sensing deployments, communication cost is not an academic nuisance. It affects bandwidth, update frequency, device battery, network contention, and how tolerable the system is to the people who also expect the Wi-Fi network to provide, inconveniently, Wi-Fi.

The business relevance is therefore not “FedAPA is cheaper to train.” That is too narrow.

The more useful interpretation is this: prototype exchange makes federated sensing closer to a deployment model where many sites can participate without constant heavy model synchronization. For offices, transport hubs, retail stores, classrooms, and hotels, that means occupancy analytics may be updated collaboratively while keeping raw sensing data local and limiting network burden.

The word “may” is doing work here. The paper’s result is based on its experimental setup, model choices, and full participation assumptions. It does not prove that every production network will enjoy the same communication economics under packet loss, intermittent devices, or security constraints. But it does show a plausible design path: communicate compact semantic summaries, not everything the model knows.

The convergence analysis supports plausibility, not deployment certainty

FedAPA includes a convergence analysis under standard nonconvex optimization assumptions: smoothness, lower bounded objectives, bounded stochastic gradients, bounded and Lipschitz representations, Lipschitz personalized aggregation, and a structured warm-up schedule. It assumes full client participation.

The analysis separates several forces: stochastic variance, prototype-refresh effects, schedule-change effects, and prototype-coupling terms. After warm-up, the paper derives a stationarity bound in which the time-averaged gradient norm is controlled by initial gap, variance, and prototype coupling. The practical interpretation is that FedAPA behaves like a nonconvex stochastic method with extra constants introduced by prototype aggregation and warm-up.

This is useful, but it should not be oversold.

The convergence section is not a production guarantee that FedAPA will remain stable in a live building network with changing router placement, unlabeled data, partial participation, or adversarial clients. Its purpose is narrower: to show that the algorithm is mathematically coherent under explicit assumptions and that the extra machinery does not turn training into uncontrolled representation soup.

That is still valuable. In business terms, the theory reduces “interesting hack” risk. It does not eliminate deployment risk.

What the paper directly shows, what Cognaptus infers, and what remains uncertain

A disciplined reading separates the result from the business extrapolation.

Layer Statement Status
Direct paper result FedAPA outperforms local training, FedAvg, WiFederated, and FedCaring on the reported Wi-Fi CSI crowd-counting experiments Directly shown in the paper’s evaluation
Direct paper result Similarity-aware prototype aggregation outperforms global average prototype aggregation in both statistical and model heterogeneity settings Directly shown by ablation
Direct paper result Prototype exchange greatly reduces per-round communication relative to full-model methods in the LargeConvNet4 setup Directly shown in the communication comparison
Cognaptus inference Multi-site occupancy analytics may benefit more from selective representation sharing than from one-size-fits-all model averaging Reasonable inference from mechanism and results
Cognaptus inference Prototype-based FL may be attractive for resource-constrained sensing deployments where bandwidth is a real operational constraint Reasonable inference, especially from the communication table
Still uncertain Performance under partial participation, scarce labels, changing environments, and production device constraints Not fully resolved by the paper

The likely enterprise use case is not “replace every camera with Wi-Fi tomorrow.” That would be the kind of sentence that gets written by a slide deck trying to raise money before lunch.

A more careful use case is multi-site occupancy sensing where camera deployment is undesirable, raw data transfer is costly, and local environments differ enough that centralized or uniform FL training becomes brittle. In that setting, FedAPA offers a design pattern: let each site stay local, but share class-level representation summaries with similar peers.

The practical value is personalized collaboration, not generic federation

FedAPA’s strongest lesson travels beyond Wi-Fi crowd counting.

Many physical-world AI deployments have the same structure: multiple sites, different local conditions, uneven labels, constrained devices, and privacy-sensitive data. Think warehouses, hospitals, schools, transport facilities, retail locations, industrial sensing systems, or smart buildings. The naive dream is one global model. The more realistic dream is a federation of local models that know when to listen to each other.

FedAPA’s architecture points toward that second dream.

The design principle is simple:

Do not average clients. Compare what they know, then decide who should influence whom.

For a business deploying edge AI, that principle changes how one evaluates system architecture. The question is not only “Can we train without centralizing raw data?” It becomes:

Architecture question Why FedAPA makes it visible
What is the smallest useful unit of shared knowledge? FedAPA uses class prototypes rather than raw data or full model parameters
How do we prevent harmful cross-site transfer? FedAPA weights prototypes by client similarity
How do we support clients with missing labels? FedAPA pads absent class prototypes using information from other clients
How do we avoid unstable early alignment? FedAPA warms up contrastive representation learning
How do we measure deployment cost? FedAPA reports communication cost, not only accuracy

That is a more serious architecture conversation than “we use federated learning.” Good. The phrase had become a little too comfortable.

Boundaries: where the result should not be stretched

The paper is careful enough to discuss limitations, and they matter for business adoption.

First, FedAPA assumes supervised learning. In real Wi-Fi sensing deployments, labeled CSI data can be scarce. People do not naturally annotate the room every time occupancy changes. A production system may need semi-supervised, weakly supervised, or self-supervised extensions before it becomes economical.

Second, the experiments use full client participation. Real devices may disconnect, sleep, fail, or skip training rounds. The paper discusses partial participation as future work and suggests that client selection should consider prototype diversity and device capability. That is a reasonable direction, but it is not yet validated here.

Third, FedAPA reduces communication but adds local computation through prototype-based contrastive losses. For powerful edge servers, that may be fine. For small embedded devices, the trade-off requires measurement.

Fourth, the task is crowd counting from Wi-Fi CSI across six environments. That is meaningful, but it is not every sensing task. The method’s business relevance is strongest where class-level prototypes are meaningful, labels are available, and local environments are related enough for selective sharing to help.

Finally, privacy is improved by not sharing raw CSI, but “not raw data” is not the same thing as a complete privacy proof. Prototype leakage, malicious clients, and security hardening are separate questions. Less exposure is good. It is not magic dust.

FedAPA is a Wi-Fi CSI crowd-counting paper, but its more interesting message is about federated learning design.

The paper shows that in heterogeneous sensing environments, the system should not ask every client to converge toward the same model or receive the same global representation. It should let clients share compact class prototypes, compare similarity, pad missing knowledge, and gradually align representations after local classification has stabilized.

That is why the mechanism matters more than the headline. FedAPA is not just “federated learning for Wi-Fi sensing.” It is a reminder that collaboration has structure. Similar clients should matter more. Missing classes need anchors. Representation learning should be paced. Communication should be small enough that the network can still do its day job.

For businesses, the practical lesson is equally plain: scalable physical-world AI will not be built by pretending every site is the same. It will be built by systems that know which differences matter, which similarities are useful, and when to borrow knowledge without importing someone else’s noise.

Signal, prototype, repeat. Not glamorous. Very deployable.

Cognaptus: Automate the Present, Incubate the Future.


  1. Jingtao Guo, Yuyi Mao, and Ivan Wang-Hei Ho, “FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting,” arXiv:2511.21048, 2025, https://arxiv.org/abs/2511.21048↩︎