Opening — Why this matters now
For the past two years, AI progress in mathematics has followed a familiar script: bigger models, better results, less transparency. Systems like proprietary frontier models quietly crossed into Olympiad-level reasoning, while the rest of the field was left reverse-engineering shadows.
Then comes QED-Nano—a 4B parameter model that politely disrupts that narrative.
Not by brute force, but by process.
The implication is uncomfortable for incumbents: reasoning performance may no longer scale linearly with model size—it may hinge more on training structure than parameter count.
Background — Context and prior art
Mathematical reasoning has long been treated as the final boss of language models. Early approaches relied on:
| Approach | Limitation |
|---|---|
| Chain-of-thought prompting | Brittle, inconsistent reasoning |
| Large-scale pretraining | Expensive, opaque |
| Tool-augmented systems | Complex scaffolding |
| Proprietary pipelines | Non-reproducible |
Recent breakthroughs—particularly in Olympiad-level problem solving—have leaned heavily on:
- Massive internal models
- Hidden datasets
- Extensive inference-time compute
In short: performance improved, but accessibility regressed.
QED-Nano flips that tradeoff.
Analysis — What the paper actually does
The authors construct a surprisingly disciplined three-stage pipeline. No magic, just iteration applied with intent.
1. Supervised Fine-Tuning (SFT): Learning to “sound like a mathematician”
The model is first distilled from a stronger system (DeepSeek-Math-V2), focusing not just on correctness—but style.
This is subtle but critical: proofs are not just answers, they are structured arguments. The model learns:
- Logical sequencing
- Formal tone
- Stepwise decomposition
In other words, it learns how to think in public.
2. Reinforcement Learning (RL): Rewarding good reasoning
Instead of naive reward signals (e.g., final answer correctness), the system uses rubric-based rewards.
| Reward Component | Purpose |
|---|---|
| Logical validity | Ensures sound reasoning |
| Completeness | Avoids skipped steps |
| Clarity | Encourages interpretable proofs |
This shifts optimization from “get the answer” to “justify the answer convincingly.” A small but meaningful philosophical pivot.
3. Reasoning Cache: Iterative refinement at inference time
This is the paper’s most interesting contribution.
Rather than generating long proofs in one pass, the model:
- Generates partial reasoning
- Summarizes it
- Refines the solution iteratively
Think of it as a looped cognition system:
| Step | Function |
|---|---|
| Generate | Draft reasoning |
| Compress | Extract key insights |
| Refine | Improve structure and correctness |
This creates a feedback loop—without increasing model size.
The result: stronger reasoning emerging from process reuse, not parameter growth.
Findings — Results with visualization
QED-Nano’s performance is, frankly, inconvenient for larger models.
| Model | Size | Relative Proof Performance | Cost Efficiency |
|---|---|---|---|
| QED-Nano | 4B | High (near frontier) | Very High |
| Nomos-1 | Larger | Lower | Moderate |
| GPT-OSS-120B | 120B | Lower | Low |
| Gemini 3 Pro | Proprietary | Slightly higher | Very Low (expensive) |
Key takeaway
Performance is no longer a simple function of scale:
$$ \text{Reasoning Quality} \neq f(\text{Parameters}) $$
Instead, a more accurate framing emerges:
$$ \text{Reasoning Quality} \approx f(\text{Training Pipeline}, \text{Inference Strategy}) $$
Which, for the AI industry, is either exciting—or mildly destabilizing.
Implications — What this means in practice
1. The “small model renaissance” is real
If 4B models can approach frontier reasoning:
- Edge deployment becomes viable
- Costs drop dramatically
- Open ecosystems regain relevance
2. Training pipelines are now the competitive moat
The differentiator is no longer just:
- Data scale
- Model size
But increasingly:
- Reward design
- Iterative inference strategies
- Curriculum structuring
Translation: process engineering beats brute force scaling.
3. Reproducibility becomes a strategic advantage
By releasing datasets, code, and pipelines, QED-Nano does something rare:
It makes progress inspectable.
For enterprises, this matters more than benchmark scores:
- Easier compliance validation
- Lower vendor lock-in
- Better auditability
4. A shift toward “thinking systems”
The reasoning cache mechanism hints at a broader trend:
Future models won’t just generate outputs—they will:
- Iterate
- Reflect
- Refine
In short, they will behave less like predictors and more like process-driven agents.
Conclusion — The quiet disruption
QED-Nano doesn’t loudly dethrone large models.
It does something more subtle—and more dangerous.
It shows that the path to better reasoning may not be bigger models, but smarter loops.
And once that idea spreads, the economics of AI shift.
Not overnight. But inevitably.
Cognaptus: Automate the Present, Incubate the Future.