Opening — Why this matters now

The AI industry has developed an unfortunate habit: celebrating systems that usually work.

From large language models hallucinating citations to reinforcement learning agents missing obvious optimal moves, the pattern is familiar—impressive performance, quietly unreliable guarantees.

This paper, “Completeness of Unbounded Best-First Minimax and Descent Minimax” fileciteturn0file0, addresses a deceptively narrow issue in game search algorithms. But underneath, it tackles something far more uncomfortable:

What if your AI can never guarantee the correct answer—even with infinite time?

That’s not a performance issue. That’s a structural flaw.

And the fix, as it turns out, is not more compute—but better logic.


Background — When “good enough” quietly fails

Classical game AI revolves around Minimax, a framework that assumes:

  • One player maximizes outcome
  • The other minimizes it
  • Optimal play leads to a well-defined equilibrium

In theory, Minimax is perfect—given infinite depth.

In practice, we cheat:

Approach Core Idea Weakness
Fixed-depth Minimax Limit search depth Misses deep strategies
Alpha-Beta pruning Skip irrelevant branches Still depth-limited
MCTS (AlphaZero) Statistical simulation Stochastic, no guarantees

Then came Unbounded Best-First Minimax (UBFM) and Descent Minimax:

  • They remove fixed depth
  • They explore selectively (non-uniform trees)
  • They integrate well with reinforcement learning

In fact, as the paper notes, these methods can outperform AlphaZero-style approaches in certain settings.

But there was a catch—an uncomfortable one.

They are not complete.

Meaning: even with unlimited time, they may never find the optimal strategy.

Not because they’re slow. Because they’re structurally biased.


Analysis — The flaw hiding in plain sight

The issue is subtle but devastating.

These algorithms rely on heuristic evaluations to guide exploration. That’s fine—until they encounter a resolved state.

The core failure mode

From the paper:

  • Resolved states (known outcomes) are preferred over unresolved ones
  • Even if unresolved states might be better
  • The algorithm keeps revisiting “safe” states
  • Exploration gets blocked

This creates a paradox:

State Type True Value Algorithm Behavior
Resolved Known (e.g., +1) Always preferred
Unresolved Estimated (possibly higher) Ignored

The system becomes risk-averse—not in a strategic sense, but in a computational one.

It stops learning where it matters.


The Intervention — Completion as a structural fix

The paper introduces a modification called completion.

At first glance, it looks like bookkeeping. In reality, it’s a philosophical shift.

Each state now tracks three values:

Value Meaning Role
$c(s)$ Completion value Exact minimax value (if known)
$v(s)$ Heuristic value Estimated value
$r(s)$ Resolution flag Whether the state is solved

This triad does something critical:

It separates certainty from estimation.

Instead of blindly trusting resolved states, the algorithm now:

  • Distinguishes known vs estimated outcomes
  • Propagates resolution explicitly
  • Forces exploration of unresolved but promising paths

Mechanically, what changes?

Two key operations:

  1. Completed best action — selects actions using $(c(s), v(s))$ lexicographically
  2. Backup resolution — propagates solved states upward

This ensures:

  • A state becomes “resolved” only when logically justified
  • Resolved states remain stable (no oscillation)
  • Exploration cannot get trapped in local certainty

Generalization — From two algorithms to a class

Rather than patching UBFM and Descent individually, the paper generalizes them into a broader framework:

Unbounded Minimax-based algorithms

This class allows variation along two axes:

Dimension Choices Effect
Depth exploration Continue vs backtrack Controls DFS vs BFS bias
Child selection Best vs alternative Controls exploration order

UBFM and Descent become just two points in this design space.

And here’s the punchline:

Every algorithm in this class is complete—with completion.


Findings — Completeness is proven, not assumed

The paper formally proves:

$$ \text{If } r(s) = 1 \Rightarrow c(s) = M(s) $$

And more importantly:

All states will be resolved in finite time.

What this means in practice

Property Before Completion After Completion
Guarantees optimal strategy ❌ No ✅ Yes
Risk of exploration deadlock ✅ Yes ❌ No
Convergence Uncertain Guaranteed
Logical correctness Heuristic-dependent Structurally enforced

This is not a performance tweak.

It is a correctness guarantee.


Empirical Results — Theory meets reality

The authors tested across 22 competitive games, including Chess, Go (9×9), Hex, and Othello.

Performance impact

Metric Without Completion With Completion
Average score -6.34% +6.34%
Improvement +12.68% swing

Some highlights (from Table 1 on page 7):

Game Improvement
Othello 8×8 +28%
Othello 10×10 +24%
Santorini +14%
Xiangqi +11%

Even where gains are modest, they are consistently positive.

The message is blunt:

Completion is not optional—it is dominant.


Implications — Beyond games, into real systems

It would be easy to dismiss this as “just game AI.” That would be a mistake.

1. Reliability beats raw performance

Modern AI systems often optimize for:

  • Speed
  • Accuracy (on average)
  • Benchmark wins

But this paper highlights a different axis:

Can the system guarantee correctness when it matters?

In finance, healthcare, or autonomous systems, the answer cannot be “usually.”


2. Exploration bias is a systemic risk

The failure mode here mirrors real-world AI:

  • Models over-trust confident outputs
  • Under-explore uncertain but valuable options
  • Reinforce early biases

Completion is effectively a bias-correction mechanism.


3. Agentic AI needs termination guarantees

As we move toward autonomous agents (your current obsession, I assume), a key question emerges:

When does the agent know it’s done?

Completion provides a blueprint:

  • Explicit resolution tracking
  • Guaranteed convergence
  • Stable final states

Without this, agent systems risk becoming… well, very confident wanderers.


4. Architecture matters more than scale

The improvement here did not come from:

  • Larger models
  • More training data
  • More compute

It came from structural redesign.

Which is quietly becoming the real frontier of AI progress.


Conclusion — Finishing the game

For years, AI systems have been excellent at playing the game.

This paper asks a more uncomfortable question:

Can they actually finish it?

With completion, the answer—finally—is yes.

And if you’re building systems that make decisions rather than suggestions, that distinction is not academic.

It’s existential.

Cognaptus: Automate the Present, Incubate the Future.