Cover image

When Tokens Explode: The Hidden Geometry Behind Attention Sinks

Opening — Why this matters now Large language models appear smooth from the outside: prompts go in, coherent text comes out. But internally, their numerical dynamics are anything but calm. In fact, inside many modern Transformers, certain tokens briefly explode into extreme values thousands of times larger than their neighbors. At the same time, a small set of tokens—often the very first token in a sequence—attracts an overwhelming share of attention from many heads. These are known as attention sinks. ...

March 6, 2026 · 5 min · Zelina
Cover image

Whispering Feelings: When ASR Models Learn to Read Emotion

Opening — Why this matters now As AI systems inch closer to everyday human interaction, emotion is no longer a “nice-to-have” signal. It is a prerequisite. Voice assistants, mental‑health tools, call‑center analytics, and social robots all face the same bottleneck: understanding not just what was said, but how it was said. Speech Emotion Recognition (SER) has promised this capability for years, yet progress has been throttled by small datasets, brittle features, and heavyweight models that struggle to scale. ...

February 6, 2026 · 4 min · Zelina
Cover image

Gated Sparse Attention: Speed Without the Sink

Opening — Why this matters now Long-context language models have crossed an uncomfortable threshold. Context windows now stretch to 128K tokens and beyond, yet the core attention mechanism still scales quadratically. The result is a growing mismatch between what models can theoretically ingest and what is economically and operationally feasible. At the same time, training instability — loss spikes, attention sinks, brittle gradients — continues to haunt large-scale runs. ...

January 24, 2026 · 4 min · Zelina