Opening — Why this matters now
Large language models can write essays, generate code, and even explain quantum physics. Yet ask them a deceptively simple question involving numbers—which value is larger, 9000 or 12000?—and things occasionally fall apart.
The problem is structural. Most language models treat numbers as if they were ordinary words. The token “42” is just another symbol, not something that carries magnitude, units, or measurement semantics.
For business applications, this limitation is more than an academic curiosity. Financial models, medical records, engineering datasets, and operational dashboards all rely heavily on numerical data. When AI systems misinterpret numbers—or fail to distinguish 30 years from 30 months—automation pipelines quietly accumulate errors.
A recent research paper proposes an elegant fix: CONE (Composite Numerical Embeddings)—a framework that teaches language models to represent numbers the way humans implicitly understand them: through value, units, and attributes.
In other words, the model finally learns that 5 kg and 5 km are not remotely the same thing.
Background — Context and prior art
Traditional embedding systems such as BERT, BioBERT, and similar transformer encoders excel at capturing relationships between words. They operate on tokenized text, where each token is mapped into a vector in semantic space.
That works well for language.
Numbers, however, obey very different rules.
| Property | Words | Numbers |
|---|---|---|
| Ordering | Usually none | Strict order |
| Distance meaning | Semantic similarity | Quantitative magnitude |
| Units | Rare | Essential |
| Representations | Discrete tokens | Scalar, range, distribution |
Most existing models ignore these differences. A number like 28,600 might even be split into tokens such as "28" and "-600", destroying its mathematical meaning during tokenization.
The situation becomes even worse in structured data, where numbers are tied to attributes.
Consider the following simplified table:
| Attribute | Value |
|---|---|
| Age | 30 |
| Follow‑up (months) | 30 |
A conventional embedding model might treat these two columns as nearly identical because the distributions overlap.
But semantically, they represent completely different quantities.
This gap—between linguistic representation and numerical semantics—is what CONE attempts to close.
Analysis — What the paper proposes
CONE introduces a composite embedding architecture designed specifically for structured numerical data.
Instead of encoding a number as a single token embedding, the system decomposes it into three independent components:
- Attribute context – what the number describes
- Numerical value – the magnitude
- Unit semantics – the measurement scale
The final representation is constructed by concatenating these components into a unified embedding vector.
Conceptually:
$$ E_{comp} = LayerNorm(W[E_a \oplus E_v \oplus E_u]) $$
Where:
- $E_a$ = attribute embedding
- $E_v$ = numeric value embedding
- $E_u$ = unit embedding
This design ensures that identical values become distinct when their context differs.
Example:
| Input | Composite representation meaning |
|---|---|
| 5 km | distance measurement |
| 5 kg | weight measurement |
| 5 years | temporal duration |
Under standard embeddings, these may cluster closely. Under CONE, they occupy different regions of vector space.
Handling ranges and uncertainty
Real-world datasets rarely contain simple scalars.
Medical or engineering data often appear as ranges or distributions:
| Example | Meaning |
|---|---|
| 18–24 kg/m² | BMI range |
| 1302 ± 0.25 nm | particle size distribution |
CONE represents these using additional components.
Ranges are encoded using center and length:
$$ center = \frac{a + b}{2}, \quad length = |b-a| $$
Gaussian measurements are represented using:
- mean − SD
- mean
- mean + SD
This preserves the statistical structure of the measurement inside the embedding space.
In practical terms, similar ranges cluster together while dissimilar ranges remain separated.
Numerical reasoning training
To teach the model numeric understanding, the authors introduce masked numeral prediction.
This is analogous to masked language modeling but for numbers.
The training objective combines two losses:
- Magnitude regression (predicting numeric value)
- Classification (predicting numeric category)
$$ L_{num} = (\log y - \log \hat{y})^2 + CE $$
This hybrid loss ensures the model captures both:
- precise magnitude
- contextual usage
A subtle but important detail: the logarithmic term stabilizes learning across different numeric scales.
Findings — Results with visualization
Across multiple benchmarks, CONE significantly improves numerical reasoning.
Numerical probing tasks
| Task | Metric | Baseline trend | CONE result |
|---|---|---|---|
| List maximum | Accuracy | Degrades with scale | ~0.95 stable |
| Number decoding | RMSE | Large error growth | Lower error |
| Addition | RMSE | Increasing drift | Improved accuracy |
These tests confirm that CONE embeddings preserve numeric magnitude more reliably than standard language embeddings.
Question answering benchmark (DROP)
| Model | EM | F1 |
|---|---|---|
| NumNet | 83.40 | 86.42 |
| AeNER | 83.69 | 86.98 |
| CONE | 83.74 | 87.28 |
The improvement may appear modest, but in numerical reasoning benchmarks even a fraction of a point is meaningful.
Structured data retrieval
CONE truly shines in table matching tasks.
| Dataset | Recall@10 improvement |
|---|---|
| WebTables | +25% vs NumNet |
| CancerKG | Highest retrieval accuracy |
| CovidKG | Consistent gains |
In other words, when searching for similar columns or records in large datasets, CONE retrieves more relevant matches.
That capability has clear implications for data integration, knowledge graphs, and automated analytics pipelines.
Implications — What this means for business and AI systems
CONE highlights a broader lesson for applied AI:
Language models are not naturally numerical models.
When companies deploy AI on financial, operational, or scientific data, ignoring numeric semantics creates subtle but dangerous errors.
Three practical implications emerge.
1. AI data pipelines need numeric‑aware embeddings
Most enterprise AI stacks rely on general-purpose embeddings for retrieval or RAG pipelines.
If those embeddings misrepresent numbers, downstream reasoning becomes unreliable.
2. Structured data requires composite representations
Tabular datasets encode meaning through multiple axes:
- attribute
- value
- units
Treating numbers as tokens discards that structure.
3. Numerical reasoning is still an open frontier
Despite rapid progress in LLM capabilities, numeric reasoning remains one of the most fragile aspects of AI systems.
Architectures like CONE demonstrate that improvements may come not from scaling models—but from better representation design.
Conclusion — The quiet revolution in embeddings
CONE does something deceptively simple.
Instead of forcing numbers to behave like words, it treats them as what they actually are: measurements embedded in context.
The result is a model that preserves magnitude, units, and attribute semantics simultaneously—unlocking more reliable reasoning across scientific, financial, and operational datasets.
For organizations building AI systems around structured data, that distinction may prove decisive.
Because when machines finally understand numbers properly, a surprising amount of “AI hallucination” disappears.
And sometimes, the difference between 30 months and 30 years matters quite a lot.
Cognaptus: Automate the Present, Incubate the Future.