Signal, Prototype, Repeat: Why Adaptive Aggregation May Be Wi‑Fi Sensing’s Missing Link

Rooms are stubborn.

A model trained in a conference room may behave confidently in a hotel room, badly in a bus, and mysteriously in a classroom. The Wi-Fi signal does not merely reflect “how many people are present.” It reflects furniture, wall geometry, transmitter placement, receiver hardware, movement patterns, and every other physical nuisance that refuses to fit neatly into a spreadsheet.

This is why Wi-Fi CSI crowd counting is more interesting than it first sounds. Channel State Information, or CSI, records how Wi-Fi signals behave across subcarriers between a transmitter and receiver. When people enter a space, their bodies disturb the signal. With enough labeled data, a model can learn to estimate occupancy from those disturbances. That is attractive for offices, buses, classrooms, hotels, retail stores, and public facilities because it avoids cameras, does not require people to carry devices, and can potentially reuse wireless infrastructure.

The catch is boring, brutal, and therefore important: every site is different.

The paper FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting proposes a personalized federated learning method for this problem.¹ Its central idea is not “use federated learning.” That would be the obvious summary, and also the less useful one. The sharper idea is this: in heterogeneous Wi-Fi sensing, collaboration should happen through compact class-level representations, and those representations should be aggregated selectively, not averaged like soup.

FedAPA’s contribution is best understood as a mechanism. It turns each client’s data into prototypes, weights peer prototypes by similarity, pads missing classes for label-skewed clients, and trains local models with a warm-up schedule that gradually introduces prototype contrastive learning. The result is a system where each site keeps its own model, but still borrows structured knowledge from similar sites.

That distinction matters. Federated learning is often sold as if privacy-preserving collaboration were the hard part and aggregation were an implementation detail. FedAPA reminds us that aggregation is the product.

The real enemy is not privacy alone; it is mismatched experience

In a centralized version of Wi-Fi crowd counting, every site could send raw CSI data to a cloud server. That creates obvious privacy, security, bandwidth, and ownership problems. The paper notes that CSI is high-dimensional and continuously generated, so uploading raw CSI streams can become a communication burden, not just a governance headache.

Federated learning seems like the cleaner answer: keep raw data local, train collaboratively, and share updates. But standard FL, especially FedAvg-style model averaging, assumes that combining client updates produces a useful shared model. In Wi-Fi sensing, that assumption can age poorly.

Consider three forms of heterogeneity in the paper’s setup:

Heterogeneity type	What it means in Wi-Fi sensing	Why naive FL struggles
Feature skew	Different rooms and layouts produce different CSI patterns for the same occupancy count	The same label can correspond to different signal geometry
Label skew	Some clients may not observe all crowd-count labels	A site may lack examples for certain occupancy levels
Model heterogeneity	Devices may have different computational capacity and therefore different model architectures	Full model averaging becomes awkward or impossible when models differ

This is the reader misconception worth killing early: federated learning does not automatically make sensing scalable. If the clients are too different, blind aggregation can cause negative transfer. A bus should not have to learn occupancy from a hotel room with equal enthusiasm. Very democratic, very wrong.

FedAPA addresses this by changing what gets shared.

A prototype is the smallest useful gossip a site can send

FedAPA does not exchange raw CSI, gradients, or full model parameters. Each client trains a local encoder and classifier. The encoder maps a CSI sample into an embedding space. For each class label — in this paper, the number of people present, ranging up to 20 — the client computes a prototype: the mean embedding for samples of that class.

A prototype is not a full dataset. It is not the model. It is a compact summary of what “class 7 people” or “class 12 people” looks like in that client’s representation space.

That small shift has several consequences.

First, prototypes are much cheaper to communicate than full CNN weights. Second, they can work across heterogeneous local architectures because the collaboration occurs in the representation space rather than through parameter averaging. Third, prototypes make similarity measurable: if two clients have similar class prototypes, they may be useful collaborators for that class.

The mechanism can be summarized like this:

Step	FedAPA action	Operational meaning
Local representation	Each client maps CSI segments into embeddings	Every site learns its own signal language
Class prototype	Each client averages embeddings by crowd-count label	Each site sends compact class summaries
Similarity weighting	Server compares client prototypes using cosine similarity	Similar sites receive more influence
Personalized return	Each client receives its own aggregated prototype set	There is no single universal “global” model
Local retraining	Client uses classification plus prototype contrastive learning	The model learns labels and a better representation space

This is the first business lesson. In distributed sensing, the valuable asset may not be the full model. It may be a lightweight representation of what each class looks like under local conditions.

That matters if the deployment target is not a research lab but a portfolio of messy buildings, rooms, vehicles, and devices. Nobody wants to ship a massive model update every few minutes just because one conference room got rearranged. Nobody sane, anyway.

Adaptive aggregation is the paper’s actual engine

The core FedAPA move is adaptive prototype aggregation.

Older prototype-based FL methods often average prototypes uniformly. If five clients have a prototype for class $c$, the server averages them and sends back one global prototype. That is easy. It is also suspiciously optimistic.

FedAPA instead computes pairwise cosine similarity between client prototypes for each class. The server then uses a softmax weighting scheme controlled by a temperature parameter. Similar clients receive higher aggregation weights; dissimilar clients matter less. Each client gets a personalized class prototype rather than the same global average.

So FedAPA is not saying “everyone contributes.” It is saying “everyone may contribute, but not equally, and not to everyone.”

That is the part worth lingering on.

In a multi-site business deployment, the question is not whether the system can train from many locations. The question is whether the system can identify which locations should learn from one another. A classroom at peak density may share useful structure with another classroom. A bus with moving passengers and different signal reflections may be less useful to a hotel room. Treating all environments as peers is not collaboration. It is a conference call with no agenda.

FedAPA’s aggregation mechanism turns cross-site learning into a similarity graph, even if the paper does not frame it in exactly those product terms. Each client’s collaboration neighborhood emerges from prototype similarity. That is the paper’s most transferable design idea.

Padding missing classes is not a footnote; it is how the method handles label skew

Real deployments do not give every site every label. A small office may rarely observe twenty people. A bus may see many high-density states. A hotel room may mostly see low occupancy. If prototype learning assumes every client has every class, the method becomes fragile in exactly the situations where it is supposed to help.

FedAPA includes a prototype padding mechanism for missing classes. When a client lacks a class, the server constructs a pseudo-prototype for that class by averaging the corresponding prototypes from other clients, weighted by sample counts. These padded prototypes help create a more complete class structure for contrastive learning.

This does not magically solve label scarcity. The authors explicitly discuss scarce labels as a limitation. But padding changes the local learning problem: a client without examples of a class can still receive a representation anchor for that class. In contrastive learning terms, those anchors can function as informative negatives and help organize the feature space.

For business interpretation, this is subtle but important. FedAPA is not only improving average benchmark performance. It is reducing the damage caused by uneven operational exposure. A site that has never seen a certain crowd level is not left completely blind. It receives a cautious representation-level hint from the federation.

That hint is not a substitute for real labels. But in deployment, hints are often cheaper than annotations.

The warm-up schedule prevents representation learning from arriving too early and ruining the party

FedAPA’s local objective combines cross-entropy classification with two prototype-based contrastive terms. One aligns local embeddings with the personalized global prototypes. The other uses the uploaded prototype sets from other clients to encourage inter-client representation learning.

The paper does not apply these representation losses at full strength from the beginning. It uses a warm-up coefficient that starts near zero and gradually increases, with the default warm-up length set to 50 rounds.

The intuition is practical. Early in training, the model should first learn basic classification boundaries. If prototype contrastive pressure dominates too early, the representation space may be organized around unstable or immature prototypes. After the classifier has learned enough structure, contrastive learning can reshape the embedding space more productively.

This is not merely a training trick. It is a governance principle for distributed learning systems: do not force alignment before clients know what they are aligning.

FedAPA’s sensitivity tests support this point. The authors compare different warm-up lengths and find that performance improves from 25 to 50 rounds but slightly declines at 100 rounds. Too short means contrastive learning arrives too aggressively. Too long means the model delays useful representation alignment. The paper also compares the warm-up coefficient with static coefficients and reports that static settings perform worse.

That test should be read as a sensitivity and ablation result, not as a universal law that “50 rounds is optimal.” It says the curriculum matters in this setup. It does not say every building, router, occupancy range, or training budget should copy the same schedule.

The main results show improvement, but the ablations explain why

The headline numbers are strong. FedAPA is evaluated on a real-world Wi-Fi CSI crowd-counting dataset across six environments, including living room, classroom, bus, conference room, and hotel room settings. The paper evaluates both statistical heterogeneity and model architecture heterogeneity. It reports Accuracy, F1, and mean absolute error, averaging results over the last five communication rounds.

The main comparison is against local training, FedAvg, WiFederated, and FedCaring.

Setting	Method	Accuracy	F1	MAE ↓
Statistical data heterogeneity	Local	72.06	71.32	0.69
Statistical data heterogeneity	FedAvg	59.46	38.26	2.01
Statistical data heterogeneity	WiFed	77.57	76.91	0.57
Statistical data heterogeneity	FedCaring	77.60	69.18	0.62
Statistical data heterogeneity	FedAPA	87.25	85.91	0.23
Model architecture heterogeneity	Local	46.81	39.45	1.26
Model architecture heterogeneity	FedAvg	46.16	37.51	2.03
Model architecture heterogeneity	WiFed	54.94	47.80	1.34
Model architecture heterogeneity	FedCaring	70.00	68.43	0.77
Model architecture heterogeneity	FedAPA	80.31	78.94	0.48

The key interpretation is not just that FedAPA wins. It is where the baselines fail.

FedAvg performs poorly under both heterogeneous settings, especially in F1 and MAE. That reinforces the paper’s central diagnosis: simple model averaging is a bad fit when local distributions and device architectures diverge. WiFederated improves over FedAvg through fine-tuning. FedCaring improves model heterogeneity performance through weighted model aggregation. But FedAPA still does better, suggesting that prototype-level, similarity-aware collaboration captures something that coarse model-level weighting misses.

The ablation table is even more useful because it isolates the mechanism.

Variant	Statistical Acc.	Statistical F1	Statistical MAE ↓	Model Acc.	Model F1	Model MAE ↓	Likely purpose
Global average prototype	74.02	73.59	0.71	54.75	47.89	1.62	Ablation against uniform prototype averaging
Global average prototype with client prototypes	82.03	81.47	0.38	69.41	67.20	0.77	Ablation testing the value of involving peer prototypes in local contrastive learning
FedAPA	87.25	85.91	0.23	80.31	78.94	0.48	Full method with similarity-aware adaptive aggregation

This is the better evidence for the article’s mechanism-first structure. If we only cite the main benchmark table, FedAPA looks like another “new method beats baselines” paper. Charming, but familiar. The ablation tells us why: uniform prototype averaging helps only so much; adding peer prototypes helps more; adaptive similarity-aware aggregation gives the full gain.

In other words, the paper is not merely validating prototypes. It is validating selective prototype collaboration.

The communication result is small in kilobytes and large in product meaning

FedAPA’s communication advantage is unusually clear. With the LargeConvNet4 model, the paper reports per-client communication cost per round as follows:

Method	Communication per round per client
FedAvg	3,710 KB
FedCaring	3,710 KB
WiFederated	3,710 KB
FedAPA	150.53 KB

That is a reported 95.94% reduction in communication overhead.

This result should be treated as operational evidence, not merely a compression statistic. In edge sensing deployments, communication cost is not an academic nuisance. It affects bandwidth, update frequency, device battery, network contention, and how tolerable the system is to the people who also expect the Wi-Fi network to provide, inconveniently, Wi-Fi.

The business relevance is therefore not “FedAPA is cheaper to train.” That is too narrow.

The more useful interpretation is this: prototype exchange makes federated sensing closer to a deployment model where many sites can participate without constant heavy model synchronization. For offices, transport hubs, retail stores, classrooms, and hotels, that means occupancy analytics may be updated collaboratively while keeping raw sensing data local and limiting network burden.

The word “may” is doing work here. The paper’s result is based on its experimental setup, model choices, and full participation assumptions. It does not prove that every production network will enjoy the same communication economics under packet loss, intermittent devices, or security constraints. But it does show a plausible design path: communicate compact semantic summaries, not everything the model knows.

The convergence analysis supports plausibility, not deployment certainty

FedAPA includes a convergence analysis under standard nonconvex optimization assumptions: smoothness, lower bounded objectives, bounded stochastic gradients, bounded and Lipschitz representations, Lipschitz personalized aggregation, and a structured warm-up schedule. It assumes full client participation.

The analysis separates several forces: stochastic variance, prototype-refresh effects, schedule-change effects, and prototype-coupling terms. After warm-up, the paper derives a stationarity bound in which the time-averaged gradient norm is controlled by initial gap, variance, and prototype coupling. The practical interpretation is that FedAPA behaves like a nonconvex stochastic method with extra constants introduced by prototype aggregation and warm-up.

This is useful, but it should not be oversold.

The convergence section is not a production guarantee that FedAPA will remain stable in a live building network with changing router placement, unlabeled data, partial participation, or adversarial clients. Its purpose is narrower: to show that the algorithm is mathematically coherent under explicit assumptions and that the extra machinery does not turn training into uncontrolled representation soup.

That is still valuable. In business terms, the theory reduces “interesting hack” risk. It does not eliminate deployment risk.

What the paper directly shows, what Cognaptus infers, and what remains uncertain

A disciplined reading separates the result from the business extrapolation.

Layer	Statement	Status
Direct paper result	FedAPA outperforms local training, FedAvg, WiFederated, and FedCaring on the reported Wi-Fi CSI crowd-counting experiments	Directly shown in the paper’s evaluation
Direct paper result	Similarity-aware prototype aggregation outperforms global average prototype aggregation in both statistical and model heterogeneity settings	Directly shown by ablation
Direct paper result	Prototype exchange greatly reduces per-round communication relative to full-model methods in the LargeConvNet4 setup	Directly shown in the communication comparison
Cognaptus inference	Multi-site occupancy analytics may benefit more from selective representation sharing than from one-size-fits-all model averaging	Reasonable inference from mechanism and results
Cognaptus inference	Prototype-based FL may be attractive for resource-constrained sensing deployments where bandwidth is a real operational constraint	Reasonable inference, especially from the communication table
Still uncertain	Performance under partial participation, scarce labels, changing environments, and production device constraints	Not fully resolved by the paper

The likely enterprise use case is not “replace every camera with Wi-Fi tomorrow.” That would be the kind of sentence that gets written by a slide deck trying to raise money before lunch.

A more careful use case is multi-site occupancy sensing where camera deployment is undesirable, raw data transfer is costly, and local environments differ enough that centralized or uniform FL training becomes brittle. In that setting, FedAPA offers a design pattern: let each site stay local, but share class-level representation summaries with similar peers.

The practical value is personalized collaboration, not generic federation

FedAPA’s strongest lesson travels beyond Wi-Fi crowd counting.

Many physical-world AI deployments have the same structure: multiple sites, different local conditions, uneven labels, constrained devices, and privacy-sensitive data. Think warehouses, hospitals, schools, transport facilities, retail locations, industrial sensing systems, or smart buildings. The naive dream is one global model. The more realistic dream is a federation of local models that know when to listen to each other.

FedAPA’s architecture points toward that second dream.

The design principle is simple:

Do not average clients. Compare what they know, then decide who should influence whom.

For a business deploying edge AI, that principle changes how one evaluates system architecture. The question is not only “Can we train without centralizing raw data?” It becomes:

Architecture question	Why FedAPA makes it visible
What is the smallest useful unit of shared knowledge?	FedAPA uses class prototypes rather than raw data or full model parameters
How do we prevent harmful cross-site transfer?	FedAPA weights prototypes by client similarity
How do we support clients with missing labels?	FedAPA pads absent class prototypes using information from other clients
How do we avoid unstable early alignment?	FedAPA warms up contrastive representation learning
How do we measure deployment cost?	FedAPA reports communication cost, not only accuracy

That is a more serious architecture conversation than “we use federated learning.” Good. The phrase had become a little too comfortable.

Boundaries: where the result should not be stretched

The paper is careful enough to discuss limitations, and they matter for business adoption.

First, FedAPA assumes supervised learning. In real Wi-Fi sensing deployments, labeled CSI data can be scarce. People do not naturally annotate the room every time occupancy changes. A production system may need semi-supervised, weakly supervised, or self-supervised extensions before it becomes economical.

Second, the experiments use full client participation. Real devices may disconnect, sleep, fail, or skip training rounds. The paper discusses partial participation as future work and suggests that client selection should consider prototype diversity and device capability. That is a reasonable direction, but it is not yet validated here.

Third, FedAPA reduces communication but adds local computation through prototype-based contrastive losses. For powerful edge servers, that may be fine. For small embedded devices, the trade-off requires measurement.

Fourth, the task is crowd counting from Wi-Fi CSI across six environments. That is meaningful, but it is not every sensing task. The method’s business relevance is strongest where class-level prototypes are meaningful, labels are available, and local environments are related enough for selective sharing to help.

Finally, privacy is improved by not sharing raw CSI, but “not raw data” is not the same thing as a complete privacy proof. Prototype leakage, malicious clients, and security hardening are separate questions. Less exposure is good. It is not magic dust.

The missing link is not Wi-Fi; it is selective learning

FedAPA is a Wi-Fi CSI crowd-counting paper, but its more interesting message is about federated learning design.

The paper shows that in heterogeneous sensing environments, the system should not ask every client to converge toward the same model or receive the same global representation. It should let clients share compact class prototypes, compare similarity, pad missing knowledge, and gradually align representations after local classification has stabilized.

That is why the mechanism matters more than the headline. FedAPA is not just “federated learning for Wi-Fi sensing.” It is a reminder that collaboration has structure. Similar clients should matter more. Missing classes need anchors. Representation learning should be paced. Communication should be small enough that the network can still do its day job.

For businesses, the practical lesson is equally plain: scalable physical-world AI will not be built by pretending every site is the same. It will be built by systems that know which differences matter, which similarities are useful, and when to borrow knowledge without importing someone else’s noise.

Signal, prototype, repeat. Not glamorous. Very deployable.

Cognaptus: Automate the Present, Incubate the Future.

Jingtao Guo, Yuyi Mao, and Ivan Wang-Hei Ho, “FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting,” arXiv:2511.21048, 2025, https://arxiv.org/abs/2511.21048. ↩︎

The real enemy is not privacy alone; it is mismatched experience#

A prototype is the smallest useful gossip a site can send#

Adaptive aggregation is the paper’s actual engine#

Padding missing classes is not a footnote; it is how the method handles label skew#

The warm-up schedule prevents representation learning from arriving too early and ruining the party#

The main results show improvement, but the ablations explain why#

The communication result is small in kilobytes and large in product meaning#

The convergence analysis supports plausibility, not deployment certainty#

What the paper directly shows, what Cognaptus infers, and what remains uncertain#

The practical value is personalized collaboration, not generic federation#

Boundaries: where the result should not be stretched#

The missing link is not Wi-Fi; it is selective learning#