Opening — Why this matters now

For the past two years, AI progress in mathematics has followed a familiar script: bigger models, better results, less transparency. Systems like proprietary frontier models quietly crossed into Olympiad-level reasoning, while the rest of the field was left reverse-engineering shadows.

Then comes QED-Nano—a 4B parameter model that politely disrupts that narrative.

Not by brute force, but by process.

The implication is uncomfortable for incumbents: reasoning performance may no longer scale linearly with model size—it may hinge more on training structure than parameter count.

Background — Context and prior art

Mathematical reasoning has long been treated as the final boss of language models. Early approaches relied on:

Approach Limitation
Chain-of-thought prompting Brittle, inconsistent reasoning
Large-scale pretraining Expensive, opaque
Tool-augmented systems Complex scaffolding
Proprietary pipelines Non-reproducible

Recent breakthroughs—particularly in Olympiad-level problem solving—have leaned heavily on:

  • Massive internal models
  • Hidden datasets
  • Extensive inference-time compute

In short: performance improved, but accessibility regressed.

QED-Nano flips that tradeoff.

Analysis — What the paper actually does

The authors construct a surprisingly disciplined three-stage pipeline. No magic, just iteration applied with intent.

1. Supervised Fine-Tuning (SFT): Learning to “sound like a mathematician”

The model is first distilled from a stronger system (DeepSeek-Math-V2), focusing not just on correctness—but style.

This is subtle but critical: proofs are not just answers, they are structured arguments. The model learns:

  • Logical sequencing
  • Formal tone
  • Stepwise decomposition

In other words, it learns how to think in public.

2. Reinforcement Learning (RL): Rewarding good reasoning

Instead of naive reward signals (e.g., final answer correctness), the system uses rubric-based rewards.

Reward Component Purpose
Logical validity Ensures sound reasoning
Completeness Avoids skipped steps
Clarity Encourages interpretable proofs

This shifts optimization from “get the answer” to “justify the answer convincingly.” A small but meaningful philosophical pivot.

3. Reasoning Cache: Iterative refinement at inference time

This is the paper’s most interesting contribution.

Rather than generating long proofs in one pass, the model:

  1. Generates partial reasoning
  2. Summarizes it
  3. Refines the solution iteratively

Think of it as a looped cognition system:

Step Function
Generate Draft reasoning
Compress Extract key insights
Refine Improve structure and correctness

This creates a feedback loop—without increasing model size.

The result: stronger reasoning emerging from process reuse, not parameter growth.

Findings — Results with visualization

QED-Nano’s performance is, frankly, inconvenient for larger models.

Model Size Relative Proof Performance Cost Efficiency
QED-Nano 4B High (near frontier) Very High
Nomos-1 Larger Lower Moderate
GPT-OSS-120B 120B Lower Low
Gemini 3 Pro Proprietary Slightly higher Very Low (expensive)

Key takeaway

Performance is no longer a simple function of scale:

$$ \text{Reasoning Quality} \neq f(\text{Parameters}) $$

Instead, a more accurate framing emerges:

$$ \text{Reasoning Quality} \approx f(\text{Training Pipeline}, \text{Inference Strategy}) $$

Which, for the AI industry, is either exciting—or mildly destabilizing.

Implications — What this means in practice

1. The “small model renaissance” is real

If 4B models can approach frontier reasoning:

  • Edge deployment becomes viable
  • Costs drop dramatically
  • Open ecosystems regain relevance

2. Training pipelines are now the competitive moat

The differentiator is no longer just:

  • Data scale
  • Model size

But increasingly:

  • Reward design
  • Iterative inference strategies
  • Curriculum structuring

Translation: process engineering beats brute force scaling.

3. Reproducibility becomes a strategic advantage

By releasing datasets, code, and pipelines, QED-Nano does something rare:

It makes progress inspectable.

For enterprises, this matters more than benchmark scores:

  • Easier compliance validation
  • Lower vendor lock-in
  • Better auditability

4. A shift toward “thinking systems”

The reasoning cache mechanism hints at a broader trend:

Future models won’t just generate outputs—they will:

  • Iterate
  • Reflect
  • Refine

In short, they will behave less like predictors and more like process-driven agents.

Conclusion — The quiet disruption

QED-Nano doesn’t loudly dethrone large models.

It does something more subtle—and more dangerous.

It shows that the path to better reasoning may not be bigger models, but smarter loops.

And once that idea spreads, the economics of AI shift.

Not overnight. But inevitably.

Cognaptus: Automate the Present, Incubate the Future.