Opening — Why this matters now
Cosine similarity has enjoyed an unusually long reign. From TF‑IDF vectors to transformer embeddings, it remains the default lens through which we judge “semantic closeness.” Yet the more expressive our embedding models become, the more uncomfortable this default starts to feel. If modern representations are nonlinear, anisotropic, and structurally rich, why are we still evaluating them with a metric that only understands angles?
This paper makes a sharp, quietly subversive claim: cosine similarity is not wrong—but it is mathematically underpowered. And the fix does not require heuristics, reweighting tricks, or post‑hoc normalization. It requires revisiting the inequality cosine is built on.
Background — Cosine similarity and its hidden assumption
Cosine similarity rests on the Cauchy–Schwarz inequality, which bounds the dot product of two vectors by the product of their norms. Normalizing by that bound yields a scale‑invariant score driven entirely by angular alignment. A perfect cosine score implies linear dependence: one vector must be a scaled version of the other.
That assumption made sense when embeddings were shallow and roughly isotropic. It is far less natural in today’s setting. Contextual and multimodal models routinely encode meaning through complex, monotonic—but not linear—relationships across dimensions. Two embeddings may agree strongly in relative structure while disagreeing in exact magnitudes. Cosine treats that as a flaw. Humans usually do not.
Analysis — A tighter bound, a wider notion of similarity
The core move of the paper is deceptively simple: Cauchy–Schwarz is not the tightest possible upper bound on a dot product. By invoking the Rearrangement Inequality, the author derives a stricter bound based on sorted vector components.
This yields a hierarchy of dot‑product bounds:
| Bound source | Normalization denominator | What “perfect similarity” means | ||
|---|---|---|---|---|
| Arithmetic–Quadratic Mean | (\tfrac{1}{2}( | u | ^2+ | v |
| Cauchy–Schwarz | ( | u | v | |
| Rearrangement Inequality | ( | u^\uparrow \cdot v^\leftrightarrow | ) | Ordinal concordance |
From these bounds emerge three similarity metrics:
- decos — sensitive to near‑identity only
- cos — sensitive to linear alignment
- recos — sensitive to monotonic, order‑preserving structure
The key conceptual shift is subtle but powerful: perfect similarity no longer requires proportionality. It requires that the two vectors rank their dimensions the same way.
Findings — Consistency beats magnitude
Across 11 embedding models and 7 standard STS benchmarks, recos outperforms cosine similarity in over 92% of comparisons. The gains are modest in absolute size—fractions of a Spearman point—but remarkably consistent. Statistical tests show near‑perfect win rates and large non‑parametric effect sizes.
The improvement pattern is revealing:
- Static embeddings gain little. Their structure already aligns with cosine’s assumptions.
- Contextual models gain more, reflecting richer internal geometry.
- Specialized and multimodal models (notably CLIP‑ViT) benefit the most, suggesting cosine systematically underestimates similarity when representations deviate from linear text semantics.
In short: the more complex the embedding space, the more cosine leaves on the table.
Implications — Rethinking similarity as structure, not angle
recos does not overthrow cosine similarity. It reframes it. Angle remains informative—but incomplete. Ordinal agreement across dimensions carries its own semantic signal, one that survives normalization and complements angular alignment.
For practitioners, the implications are pragmatic:
- Semantic evaluation: recos offers a drop‑in replacement when ranking quality matters more than raw geometry.
- Model diagnostics: divergence between cos and recos can expose nonlinear structure in embeddings.
- Future training objectives: similarity need not be purely angular—ordering constraints may be learnable signals.
The trade‑off is computational. Sorting adds an (O(d \log d)) cost. For billion‑scale retrieval, approximations will be necessary. But the conceptual door is now open.
Conclusion — Cosine was never the whole story
Cosine similarity answered the right question for an earlier generation of representations. This work shows that modern embeddings are answering richer questions than cosine knows how to ask.
By grounding similarity in a tighter mathematical bound, recos expands what “similar” is allowed to mean—without hand‑waving, and without abandoning theory. It is a reminder that sometimes progress in AI does not come from bigger models, but from sharper mathematics.
Cognaptus: Automate the Present, Incubate the Future.