Edge AI has a habit of turning every efficiency problem into a hardware problem. Buy a better chip. Quantise the model. Move the workload closer to the sensor. Reduce the precision until the accuracy team starts twitching.
This paper takes a quieter route. It asks whether part of the energy problem comes not from the sensor, the chip, or even the whole network, but from the way the network is asked to speak.
In conventional spiking neural networks, the output layer usually behaves like a familiar classifier wearing a neuromorphic hat: one output neuron per class, then decide which class wins by spike count or spike timing. Cedrick Kinavuidi, Luca Peres, and Oliver Rhodes propose a different decoding interface: train the spiking neural network to output a binary hypervector directly, then classify by Hamming distance against known class hypervectors.1 In other words, instead of asking the network to shout “class three” louder or earlier than the others, ask it to assemble a distributed signature.
That sounds abstract. It is not. The business version is simple: if edge systems are going to use event cameras and neuromorphic processors, then the output representation matters. A wasteful decoder can turn a theoretically efficient sensing stack into a tiny electricity bonfire. A better decoder will not magically make neuromorphic AI mainstream, because magic remains disappointingly hard to procure. But it can improve the trade-off among accuracy, latency, and energy where that trade-off is the actual product constraint.
The useful way to read this paper is not as “SNN-HDC beats everything.” It does not. On DvsGesture, the deeper rate-decoded model is both more accurate and faster. The better reading is comparative: what changes when the same spiking backbone is decoded by rate, latency, or hyperdimensional similarity?
That comparison is where the paper earns attention.
The decoder is the product interface, not a footnote
Spiking neural networks process information as events over time. Their promised advantage comes from sparse activity: if neurons do not spike, there is less work to do. This promise only cashes out on suitable neuromorphic hardware, and only if the model design does not force excessive spikes. Rate decoding has a habit of doing exactly that.
Rate decoding decides by counting output spikes over a time window. It is often accurate, but it encourages activity. Latency decoding decides by whichever output neuron spikes first. It can be faster and cheaper, but is typically more fragile because one early spike can carry too much decision weight. Population variants spread the decision across more neurons, but they still sit inside the same broad one-hot worldview: every class owns a slot.
SNN-HDC changes the slot. The output layer no longer has one neuron per class. It has $D$ output neurons, each corresponding to one dimension of a binary hypervector. Every dimension starts at 0. If the corresponding output neuron spikes during the sample, that dimension flips to 1. The resulting vector is compared with pre-generated class hypervectors using Hamming distance.
That shift matters because the output is no longer a direct class label. It is a distributed concept representation. A class is not “the neuron that fired most.” A class is “the known hypervector closest to the pattern of features observed.”
The paper’s mechanism can be compressed into a small comparison:
| Decoding method | Output representation | Decision rule | Expected pressure on spikes | Main operational trade-off |
|---|---|---|---|---|
| Rate decoding | One neuron per class | Highest spike count | High | Strong accuracy, higher spike activity |
| Latency decoding | One neuron per class | Earliest spike | Lower than rate, but unstable | Lower activity, weaker accuracy |
| SNN-HDC | Binary hypervector | Smallest Hamming distance | Low if features only need to be present | Distributed representation, larger output layer |
The important design choice is not merely “use hypervectors.” Earlier SNN-HDC combinations such as SpikeHD and HyperSpike used SNNs as feature generators and then projected outputs into hypervectors through matrices. This paper removes that intermediate projection step. The SNN is trained directly to produce the hypervector. That avoids a two-stage “SNN first, HDC later” pipeline and removes matrix multiplication from the encoding path.
This is a very neuromorphic kind of elegance: fewer expensive operations, fewer inherited ANN habits, and fewer reasons to pretend that a one-hot softmax-shaped mind is biologically inevitable. A small mercy.
The experiment is built to compare decoders, not chase leaderboard glory
The authors are explicit that the goal is not to build the biggest model or claim state-of-the-art dominance across neuromorphic vision. The models are relatively small, and the architectures are chosen so that different decoding methods can be compared cleanly.
They test on two event-based vision datasets.
The first is DvsGesture, with eleven hand-gesture classes. The paper uses the Tonic train/test split, downsampled to $32 \times 32$, and only the first 1,500 ms of each sample. The second is SL-Animals-DVS, a sign-language animal-word dataset with nineteen classes, four lighting conditions, and leave-signers-out cross-validation. Again, inputs are downsampled to $32 \times 32$ and clipped to the first 1,500 ms.
That 1,500 ms window is not a detail to skip. It is part of the boundary. These are not open-ended streaming deployments. They are fixed-window benchmark experiments, albeit with one-millisecond frames to preserve temporal resolution better than heavily compressed frame sequences.
For each dataset, the authors compare five model types:
- SNN-HDC, which outputs a hypervector.
- Rate-1, a same-depth rate-decoded network with one-hot output.
- Rate-2, a deeper rate-decoded network that adds a one-hot layer after the hidden size used by SNN-HDC.
- Latency-1, a same-depth latency-decoded network.
- Latency-2, a deeper latency-decoded network.
This comparison design is sensible because SNN-HDC changes the architecture. A hypervector output layer with 1,024 dimensions is not the same thing as an eleven-class one-hot output layer. Comparing only against a same-depth one-hot model would make SNN-HDC look better partly because it has a much larger output representation. Comparing against the deeper one-hot model asks the sharper question: is the gain coming from hyperdimensional decoding, or simply from adding capacity?
That question becomes important in the results.
The main evidence is a trade-off triangle: accuracy, energy, latency
On DvsGesture, the selected SNN-HDC model uses $D = 1024$ and $\beta = 0.9$, after exploratory tests over hypervector dimensionality and membrane decay rate. Accuracy improves as dimensionality increases, but the authors report no further accuracy improvement beyond 1,024 dimensions, while estimated energy keeps rising. That parameter sweep is best read as an implementation sensitivity test: it identifies the point where extra hypervector width stops buying enough accuracy to justify its energy cost.
The core DvsGesture comparison is more interesting than a single winner column.
| Model | Accuracy | Avg. spike count | Estimated energy | Latency |
|---|---|---|---|---|
| SNN-HDC | 96.59% | $1.53 \times 10^5$ | 2.54 mJ | 162 ms |
| Rate-1 | 95.08% | $4.63 \times 10^5$ | 4.39 mJ | 208 ms |
| Rate-2 | 97.35% | $5.13 \times 10^5$ | 9.31 mJ | 133 ms |
| Latency-1 | 79.17% | $2.66 \times 10^5$ | 3.14 mJ | 206 ms |
| Latency-2 | 85.23% | $3.93 \times 10^5$ | 7.37 mJ | 188 ms |
The headline is not “SNN-HDC wins DvsGesture.” It does not win every column. Rate-2 reaches 97.35% accuracy versus 96.59% for SNN-HDC, and Rate-2 latency is 133 ms versus 162 ms. Anyone selling this as a universal rate-decoding killer is doing the usual interpretive vandalism.
The real result is narrower and more useful. SNN-HDC uses the fewest spikes and the least estimated energy among the five comparable models. Its energy advantage ranges from 1.24× versus Latency-1 to 3.67× versus Rate-2. Its average firing rate is also lowest: 20.5 Hz, compared with 78.2 Hz for Rate-1, 68.2 Hz for Rate-2, 44.9 Hz for Latency-1, and 52.7 Hz for Latency-2.
That tells us something operational. SNN-HDC is not merely a more exotic output code. It changes the learning pressure inside the network. The rate-decoded model is trained to fire often enough for spike counts to separate classes. The SNN-HDC loss only needs desired “on” dimensions to fire at least once; additional spikes for those dimensions do not increase the useful signal. For dimensions meant to remain off, extra spikes are penalised. This makes sparsity a consequence of the representation, not just a regularisation wish.
That distinction is the paper’s strongest practical point. Energy efficiency in spiking systems is not only about pruning or hardware. It is also about what the loss function rewards the network for doing.
SL-Animals-DVS is the stronger generalisation check, but not a victory parade
The second dataset, SL-Animals-DVS, tests whether the pattern survives a different neuromorphic vision task and architecture. The authors keep the DvsGesture-selected $D = 1024$ and $\beta = 0.9$, which makes this more of a transfer-style robustness check than another full parameter hunt.
The result is encouraging, but again not leaderboard theatre.
| Model | Accuracy | Avg. spike count | Estimated energy | Latency |
|---|---|---|---|---|
| SNN-HDC | 74.13% | $6.86 \times 10^5$ | 3.61 mJ | 570 ms |
| Rate-1 | 70.68% | $1.85 \times 10^6$ | 8.21 mJ | 668 ms |
| Rate-2 | 72.74% | $1.71 \times 10^6$ | 7.56 mJ | 592 ms |
| Latency-1 | 47.34% | $9.31 \times 10^5$ | 4.97 mJ | 361 ms |
| Latency-2 | 45.12% | $1.08 \times 10^6$ | 5.43 mJ | 356 ms |
Here SNN-HDC has the highest accuracy among the five compared models and the lowest estimated energy. Energy reductions range from 1.38× to 2.27×. It also beats both rate-decoded models on latency, though the latency-decoded models are faster. The catch is that the latency models are also much less accurate, so their speed is less useful unless the business case accepts a fairly dramatic accuracy penalty. Most do not, unless the task is disposable. Gesture recognition normally is not.
Against external models, SNN-HDC should be read carefully. On DvsGesture, it outperforms SpikeHD and HyperSpike, the prior SNN-HDC-style approaches listed by the authors, but it does not beat all published models. ACE-BET, an ANN, reports 98.88%; mMND, a rate-decoded SNN, reports 98.0%. On SL-Animals-DVS, EventRPG reports 91.59% and EvT reports 88.12%, both far above SNN-HDC’s 74.13%. The authors did not set out to beat those larger systems, and the paper’s architecture choices support that claim.
So the evidence is not: “This is the best neuromorphic vision model.”
The evidence is: “Within controlled comparisons among analogous spiking decoders, SNN-HDC consistently reduces spike activity and estimated energy, while remaining competitive on accuracy and sometimes improving it.”
That is a smaller sentence. It is also the more commercially useful one.
Unknown-class detection is not a side trick; it follows from the representation
The paper’s third contribution is threshold-based unknown-class detection. This is easy to underplay because it appears after the main accuracy-energy tables. It deserves more attention.
A one-hot classifier is structurally forced to choose among known classes. It can be calibrated, wrapped, or given rejection logic, but the output space itself says: pick from the menu. SNN-HDC outputs a hypervector and compares it with known class hypervectors. If the result is too far from every known class, the system can label it unknown.
The authors test this on DvsGesture by training SNN-HDC on only ten of the eleven classes, excluding the random gesture class. The trained model is then evaluated on all eleven classes. A threshold $\delta$ on normalised Hamming distance determines whether a sample is close enough to a known class. If the distance is below $\delta$, classify as known; otherwise, classify as unknown.
The paper reports that the model identifies 100% of samples from the unseen class. That is the cleanest evidence for the rejection behaviour. It does not mean open-set recognition is solved. It means this representation gives the model a natural dissimilarity signal that can be thresholded.
For business use, that distinction matters. Edge AI systems often fail not because the known classes are hard, but because the real world keeps submitting things outside the training taxonomy. A warehouse gesture system sees a new hand motion. A driver-monitoring camera sees an occlusion pattern. A factory vision system sees a tool used in a non-standard way. The expensive failure is not always a wrong known-class label; sometimes it is the system being confidently wrong when it should have said, “I do not know what that is.”
SNN-HDC’s threshold mechanism is not an enterprise-grade anomaly detector yet. It is a reason to investigate one.
Why the energy claim is promising but not yet a procurement argument
The energy results are among the paper’s most attractive findings. They are also the easiest to over-sell.
The paper estimates energy using synaptic operations. A synaptic operation occurs when a source neuron sends a spike event through a non-zero synapse to a target neuron. The authors count average SOPs per test sample and multiply by 26 pJ per SOP, based on IBM TrueNorth. This is a standard kind of neuromorphic estimate, and it is useful for comparing models under a common assumption.
It is not the same as measuring wall-plug power on a deployed system.
This distinction should not deflate the result. It should locate it. The paper shows that SNN-HDC creates fewer spikes and fewer estimated synaptic-operation costs under its experimental setup. That is exactly the kind of mechanism one would want before moving to hardware measurement. But a product team still needs to test end-to-end power, memory access, routing overhead, sensor power, batching behaviour, and deployment latency on actual hardware.
There is also a memory trade-off. When SNN-HDC simply replaces a one-hot output layer, it can require more parameters. The authors report that SNN-HDC uses 37.1× more parameters than the same-depth one-hot counterpart on DvsGesture and 1.88× more on SL-Animals-DVS. Against the deeper one-hot counterparts, however, SNN-HDC uses fewer parameters: 1.01× fewer on DvsGesture and 1.35× fewer on SL-Animals-DVS. So the parameter story depends on what baseline the buyer actually intended to deploy.
That is not a flaw. It is the normal irritation of engineering: there is no free lunch, only different invoices.
What this means for edge AI businesses
The practical market for this paper is not generic computer vision. It is event-based, low-power, latency-sensitive perception where the system sees change rather than frames. Gesture recognition, sign-language interfaces, robotics, wearable sensing, industrial monitoring, driver and operator monitoring, and always-on human-machine interfaces are plausible categories.
The paper directly shows three things.
First, SNN-HDC can be trained end-to-end so that the spiking network outputs binary hypervectors directly. That removes the need for a separate projection-matrix encoding stage after the SNN. The mechanism is cleaner than earlier SNN-HDC hybrids.
Second, in the tested neuromorphic vision settings, SNN-HDC reduces spike count and estimated energy relative to comparable rate and latency decoders. This is the main operational result.
Third, the hypervector distance signal supports threshold-based unknown-class detection in the DvsGesture experiment.
Cognaptus would infer three business implications, with boundaries attached.
| Paper result | Business interpretation | Boundary |
|---|---|---|
| Direct hypervector output avoids post-hoc projection matrices | Cleaner pipeline for neuromorphic inference, especially where multiplication-heavy stages are undesirable | Needs implementation on actual neuromorphic hardware to validate end-to-end gain |
| Lower spike count and estimated SOP energy | Potentially lower per-inference energy for event-camera edge workloads | Energy is estimated, not measured deployment power |
| Competitive accuracy with lower estimated energy | Useful for products where battery life and always-on sensing matter more than squeezing out the final percentage point | Does not beat all rate models or larger published systems |
| Hamming-distance threshold can reject unknowns | Natural route to open-set or anomaly-aware edge perception | Tested narrowly on one held-out DvsGesture class |
The most realistic adoption path is not replacing every edge vision model. It is using SNN-HDC as a candidate decoder when three conditions hold: the input is event-based, the hardware can exploit spike sparsity, and the product benefits from rejecting unfamiliar patterns instead of forcing every input into a known class.
That combination is not universal. But where it exists, it is valuable.
The comparison with rate decoding is the part to keep honest
The likely reader misconception is obvious: hyperdimensional decoding sounds brain-like, sparse, and robust, therefore it must be better than rate decoding. The paper does not prove that.
Rate decoding remains hard to beat on accuracy. On DvsGesture, Rate-2 reaches the highest accuracy and the lowest latency among the five internal models. It pays for that with substantially more spikes and estimated energy. That is a trade-off, not a defeat.
Latency decoding also behaves less neatly than a simple textbook story would suggest. It is supposed to be faster, but on DvsGesture the latency-decoded models are not faster than Rate-2 or SNN-HDC. On SL-Animals-DVS, they are faster, but their accuracy collapses relative to SNN-HDC and rate decoding. A decoding method optimised around “first spike wins” can be elegant in theory and brittle in practice. Theory, famously, does not have to ship products.
The real comparison is therefore not three methods standing on a podium. It is three ways of paying for a decision:
- Rate decoding pays with spikes.
- Latency decoding pays with accuracy instability.
- SNN-HDC pays with representation width and memory considerations, while reducing spike activity.
That is the correct mental model.
Where the paper stops
The paper gives a strong decoder-level argument, but it stops before several deployment questions.
The datasets are small by modern AI standards. DvsGesture and SL-Animals-DVS are useful neuromorphic benchmarks, but they are not proof of production robustness across messy edge environments. SL-Animals-DVS includes all four lighting conditions, including noisy samples, which helps, but this is still benchmark evidence.
The sample window is fixed at 1,500 ms. The authors explicitly motivate future work on dynamic hypervectors for continuous online streams. That future direction is important because many real edge systems do not receive neatly segmented gestures. They receive continuous behaviour with ambiguous starts and stops. SNN-HDC’s “build a hypervector over time” design seems compatible with streaming, but the paper does not fully demonstrate a reset-free continuous classifier.
The energy figures are estimates based on SOPs, not hardware measurements. They are valid as comparative modelling evidence, not as battery-life guarantees. Procurement teams may now resume breathing normally.
The approach also assumes binary hypervectors generated with roughly balanced dimensions and relies on Hamming-distance comparison. The authors themselves point to future variants that encode spike counts, inter-spike intervals, fewer on-bits, or dynamic trajectories through hyperspace. That matters because the current method deliberately ignores information in repeated spiking and precise spike timing for dimensions that are already on. This is a feature for efficiency, but it may be a limitation for tasks where timing carries class-critical information.
The useful lesson is representational discipline
This paper is valuable because it does not treat the output layer as an administrative formality. In many AI systems, the representation at the end is inherited from convenience: one-hot labels, softmax probabilities, cross-entropy, done. In spiking systems, that inheritance can be expensive. If the network is rewarded for producing many spikes, it should not surprise anyone when the model produces many spikes. Incentives work, even inside silicon.
SNN-HDC replaces that incentive with a different one. A class becomes a sparse distributed signature. Presence is enough. Repetition is not always rewarded. Similarity can be measured without matrix multiplication. Unknowns can be rejected by distance rather than politely forced into the nearest wrong bucket.
For businesses building event-camera or neuromorphic edge systems, the conclusion is not “switch to SNN-HDC immediately.” The conclusion is more practical: benchmark the decoder, not just the backbone. The cheapest model may not be the smallest. The fastest model may not be the most useful. The most accurate model may be wasting energy to win a margin the product does not need.
SNN-HDC gives teams a new candidate in that design space: a decoder that spends representation width to save spikes.
That is not magic. It is better accounting.
Cognaptus: Automate the Present, Incubate the Future.
-
Cedrick Kinavuidi, Luca Peres, and Oliver Rhodes, “Hyperdimensional Decoding of Spiking Neural Networks,” arXiv:2511.08558, 2025, https://arxiv.org/pdf/2511.08558. ↩︎