QED-Nano: Small Models, Big Proof Energy

Opening — Why this matters now

For the past two years, AI progress in mathematics has followed a familiar script: bigger models, better results, less transparency. Systems like proprietary frontier models quietly crossed into Olympiad-level reasoning, while the rest of the field was left reverse-engineering shadows.

Then comes QED-Nano—a 4B parameter model that politely disrupts that narrative.

Not by brute force, but by process.

The implication is uncomfortable for incumbents: reasoning performance may no longer scale linearly with model size—it may hinge more on training structure than parameter count.

Background — Context and prior art

Mathematical reasoning has long been treated as the final boss of language models. Early approaches relied on:

Approach	Limitation
Chain-of-thought prompting	Brittle, inconsistent reasoning
Large-scale pretraining	Expensive, opaque
Tool-augmented systems	Complex scaffolding
Proprietary pipelines	Non-reproducible

Recent breakthroughs—particularly in Olympiad-level problem solving—have leaned heavily on:

Massive internal models
Hidden datasets
Extensive inference-time compute

In short: performance improved, but accessibility regressed.

QED-Nano flips that tradeoff.

Analysis — What the paper actually does

The authors construct a surprisingly disciplined three-stage pipeline. No magic, just iteration applied with intent.

1. Supervised Fine-Tuning (SFT): Learning to “sound like a mathematician”

The model is first distilled from a stronger system (DeepSeek-Math-V2), focusing not just on correctness—but style.

This is subtle but critical: proofs are not just answers, they are structured arguments. The model learns:

Logical sequencing
Formal tone
Stepwise decomposition

In other words, it learns how to think in public.

2. Reinforcement Learning (RL): Rewarding good reasoning

Instead of naive reward signals (e.g., final answer correctness), the system uses rubric-based rewards.

Reward Component	Purpose
Logical validity	Ensures sound reasoning
Completeness	Avoids skipped steps
Clarity	Encourages interpretable proofs

This shifts optimization from “get the answer” to “justify the answer convincingly.” A small but meaningful philosophical pivot.

This is the paper’s most interesting contribution.

Rather than generating long proofs in one pass, the model:

Generates partial reasoning
Summarizes it
Refines the solution iteratively

Think of it as a looped cognition system:

Step	Function
Generate	Draft reasoning
Compress	Extract key insights
Refine	Improve structure and correctness

This creates a feedback loop—without increasing model size.

The result: stronger reasoning emerging from process reuse, not parameter growth.

Findings — Results with visualization

QED-Nano’s performance is, frankly, inconvenient for larger models.

Model	Size	Relative Proof Performance	Cost Efficiency
QED-Nano	4B	High (near frontier)	Very High
Nomos-1	Larger	Lower	Moderate
GPT-OSS-120B	120B	Lower	Low
Gemini 3 Pro	Proprietary	Slightly higher	Very Low (expensive)

Key takeaway

Performance is no longer a simple function of scale:

$$ \text{Reasoning Quality} \neq f(\text{Parameters}) $$

Instead, a more accurate framing emerges:

$$ \text{Reasoning Quality} \approx f(\text{Training Pipeline}, \text{Inference Strategy}) $$

Which, for the AI industry, is either exciting—or mildly destabilizing.

Implications — What this means in practice

1. The “small model renaissance” is real

If 4B models can approach frontier reasoning:

Edge deployment becomes viable
Costs drop dramatically
Open ecosystems regain relevance

2. Training pipelines are now the competitive moat

The differentiator is no longer just:

Data scale
Model size

But increasingly:

Reward design
Iterative inference strategies
Curriculum structuring

Translation: process engineering beats brute force scaling.

3. Reproducibility becomes a strategic advantage

By releasing datasets, code, and pipelines, QED-Nano does something rare:

It makes progress inspectable.

For enterprises, this matters more than benchmark scores:

Easier compliance validation
Lower vendor lock-in
Better auditability

4. A shift toward “thinking systems”

The reasoning cache mechanism hints at a broader trend:

Future models won’t just generate outputs—they will:

Iterate
Reflect
Refine

In short, they will behave less like predictors and more like process-driven agents.

Conclusion — The quiet disruption

QED-Nano doesn’t loudly dethrone large models.

It does something more subtle—and more dangerous.

It shows that the path to better reasoning may not be bigger models, but smarter loops.

And once that idea spreads, the economics of AI shift.

Not overnight. But inevitably.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What the paper actually does#

1. Supervised Fine-Tuning (SFT): Learning to “sound like a mathematician”#

2. Reinforcement Learning (RL): Rewarding good reasoning#

3. Reasoning Cache: Iterative refinement at inference time#

Findings — Results with visualization#

Key takeaway#

Implications — What this means in practice#

1. The “small model renaissance” is real#

2. Training pipelines are now the competitive moat#

3. Reproducibility becomes a strategic advantage#

4. A shift toward “thinking systems”#

Conclusion — The quiet disruption#