Completeness Is Not Optional — Why Game-Playing AI Finally Learned to Finish What It Starts

The algorithm did not lose because it was shallow

Endgames are where polite uncertainty goes to die.

Early in a game, a search algorithm can afford approximation. The tree is huge, the clock is rude, and the best it can do is lean on an evaluation function that says, with the usual machine confidence, “this line looks promising.” Fine. Nobody expects omniscience on move three.

But late in the game, the demand changes. The tree is smaller. The outcome may be reachable. If a winning line exists, the algorithm should eventually find it. If a drawing line is the best possible result, it should know that too. At that point, “I searched many interesting branches” is not a strategy. It is a diary.

Quentin Cohen-Solal’s paper Completeness of Unbounded Best-First Minimax and Descent Minimax asks a narrow but important question: when Unbounded Best-First Minimax and Descent Minimax are given the completion technique, do they become complete search algorithms for finite, perfect-information, two-player games?¹

The answer is yes. More precisely, the paper generalizes both algorithms into a broader class called Unbounded Minimax-based algorithms, then proves that every algorithm in that class is complete in finite time when it uses completion. The paper also reports that completion improves practical winning performance for Unbounded Best-First Minimax across 22 benchmark games, with an average score of 6.34% for completion versus -6.34% without it.

That sounds like a technical patch. It is not. It is a reminder that in decision systems, search depth and search correctness are different species. More compute can expand a flawed procedure. It does not automatically repair the procedure.

The failure mode: local certainty can block global search

The tempting reader misconception is simple: if an unbounded search algorithm can keep searching, then enough time should eventually make it find the winning strategy. That feels intuitive. It is also exactly where the trap sits.

Unbounded Best-First Minimax was designed to avoid the rigidity of fixed-depth minimax. Classic minimax searches to a preset depth and evaluates the leaves. Alpha-beta pruning makes that cheaper, but it does not remove the depth boundary. Unbounded Best-First Minimax instead explores the game tree non-uniformly, going deeper into lines that appear promising. Descent Minimax modifies this with a stronger depth-first bias, which is useful when endgame information should be propagated during learning.

So far, this sounds like progress: less mechanical depth control, more selective intelligence. Wonderful. Now for the small print.

In their basic forms, these algorithms can fail to be complete. The failure is not that they run out of time. The paper explicitly discusses cases where they may fail even with infinite search time. The reason is structural: without using resolution information correctly, the algorithm may keep preferring a resolved state over an unresolved state whose heuristic evaluation looks worse, even though that unresolved state may turn out to be better once actually explored.

This creates a strange computational conservatism. The algorithm clings to known outcomes because they are known, while uncertain alternatives remain under-explored because their estimates look unattractive. In a business setting, one might call this “process discipline.” In search, it is just how you miss the winning move while feeling very organized.

The core distinction is this:

State type	What the algorithm knows	What can go wrong
Resolved state	Its game outcome is known	It can become over-favored simply because it is certain
Unresolved state	Its value is estimated	Its estimate may be too pessimistic, so it may never receive enough search
Root decision	Must compare both kinds	Certainty and estimation get mixed as if they were the same kind of evidence

The paper’s central mechanism is designed to prevent exactly this mixing.

Completion separates “known”, “estimated”, and “resolved”

Completion gives each state in the partial search tree three values:

Value	Meaning	Why it matters
$c(s)$	Completion value	The exact minimax value with respect to the basic game outcome, if known; otherwise set to 0
$v(s)$	Heuristic evaluation	The estimated minimax value in the partial tree, using terminal and learned/adaptive evaluation functions
$r(s)$	Resolution flag	Whether the state is weakly resolved

The elegance is not in the notation. The elegance is in the separation.

A value of 0 in $c(s)$ alone is ambiguous: it may mean “draw” or “unknown.” The resolution flag $r(s)$ disambiguates that. The heuristic value $v(s)$ remains available for guiding search, but it is no longer confused with exact game resolution. The algorithm can therefore use estimates without pretending they are proof.

Two operations carry this bookkeeping into search behavior.

First, completed best action chooses among child states using the completion-aware tuple rather than a naked heuristic value. For max-player states and min-player states, the ordering is adapted so that the search respects both the exact completion value and the heuristic evaluation, with selection-count tie handling in the formal algorithm.

Second, backup resolution propagates solved status upward. A state can become resolved if the appropriate logical condition is satisfied: for example, a player has a resolved winning child, or all children are resolved. Once a state is resolved, its completion value should not wobble around like a quarterly KPI adjusted after the board meeting.

That stability is not cosmetic. It is the base of the proof.

The proof is a progress argument, not a magic convergence slogan

The paper’s formal result is easier to understand if we ignore the symbols for a moment and track the proof’s job.

It must show two things:

When a state is marked as resolved, the completion value is actually the minimax value.
Every state will be marked resolved after a finite number of iterations.

The first part prevents false certainty. The second prevents endless wandering.

The paper defines completeness for an Unbounded Minimax-based algorithm as follows: for any finite perfect-information two-player game and any state, there exists enough search time such that applying the algorithm returns a resolved state whose completion value equals the exact minimax value.

In compact form, the key correctness relationship is:

$$ r(s) = 1 \Rightarrow c(s) = M(s) $$

where $M(s)$ is the exact minimax value of state $s$.

The proof then proceeds through a chain of lemmas and propositions. Their roles are worth separating, because this is where the paper’s real contribution sits.

Proof component	Likely purpose	What it supports	What it does not prove alone
Resolved states do not change	Stability condition	Once a state is solved, its value remains safe to use	Does not show all states will be solved
Resolution conditions	Logical validity	A state is resolved only under justified child-state conditions	Does not show search will reach those conditions
Resolved implies exact	Main correctness step	$c(s)$ equals the exact minimax value when $r(s)=1$	Does not show finite-time completion
Each iteration adds progress	Termination support	Search either expands the tree or resolves another state	Depends on game finiteness
Finite resolution proposition	Completeness bridge	Every state eventually becomes resolved	Applies to the defined finite perfect-information setting

This is a mechanism-first proof. It is not saying, “the algorithm usually converges in practice.” It is saying: in the specified game class, the algorithm cannot keep spinning forever without either adding an unresolved state or resolving a state; because the state space is finite, that progress must eventually finish.

The theorem follows: any Unbounded Minimax-based algorithm is complete.

Notice the generality. The result is not only about one hand-tuned version of UBFM or one implementation of Descent. The paper defines a class of algorithms that differ in how they decide whether to continue deeper along the current line of play or return toward the root, and how they prioritize child exploration. UBFM and Descent are two points in this broader design space. Completion is the invariant that makes the class complete.

UBFM and Descent are two exploration personalities inside the same family

The generalization matters because UBFM and Descent explore differently.

UBFM returns to the root after extending the selected line by one state. Descent continues down the best unresolved sequence until it reaches a terminal or resolved state. In simpler terms: UBFM behaves more like repeated best-first refinement, while Descent carries a stronger depth-first instinct.

The new Unbounded Minimax-based class abstracts over that choice. It allows different behaviors for:

Design axis	What varies	Why it matters operationally
Deep exploration	Continue down the current line or return to reconsider from the root	Controls how quickly endgame information is reached versus how often alternatives are reconsidered
Breadth exploration	Which child to explore first	Controls search ordering and sensitivity to heuristic guidance
Completion handling	Track $c(s)$, $v(s)$, and $r(s)$	Preserves correctness across different exploration orders

This is useful because practical search algorithms are rarely pure textbook objects. Engineers modify exploration order, caching, tie-breaking, and rollout behavior. A proof tied to only one exact traversal routine can be fragile. A proof over a class is more architecture-relevant: it says which part of the design is essential for correctness and which parts are movable.

The essential part is not “always explore exactly like UBFM” or “always explore exactly like Descent.” The essential part is that the algorithm must maintain resolution-aware completion values and propagate them correctly.

The experiment tests whether correctness also buys play strength

The theoretical result answers the formal question. The experiment answers a more impatient engineering question: should anyone bother implementing completion?

The paper compares Unbounded Best-First Minimax with completion against Unbounded Best-First Minimax without completion across 22 deterministic, zero-sum, perfect-information games. These include games such as Amazons, Arimaa, Ataxx, Breakthrough, Chess, Go 9x9, Gomoku, Havannah, several Hex sizes, Othello, Santorini, Surakarta, Xiangqi, and others used in Computer Olympiad-style benchmarks.

The experimental setup is not a casual “we played a few games and vibes were positive” exercise. For each game, the paper uses six distinct sets of evaluation functions, each with roughly fifteen functions. The algorithms compete as both first and second player for pairs of evaluation functions, producing roughly 2,700 matches per game. Each algorithm receives 10 seconds of search time per action. Outcomes are scored as 1 for win, 0 for draw, and -1 for loss, with 95% confidence intervals reported.

The main result: completion improves the average score to 6.34%, while the non-completion version scores -6.34%.

That is a 12.68 percentage-point swing in the head-to-head scoring frame. Some games show small or zero measured differences. Others show large gains: Ataxx is reported at +55%, Othello 8x8 at +28%, Othello 10x10 at +24%, Santorini at +14%, and Xiangqi at +11%.

Evidence item	Likely purpose	Interpretation	Boundary
22-game benchmark	Main empirical evidence	Completion improves UBFM performance across a broad perfect-information game set	Still limited to this game family and experimental protocol
Six evaluation-function sets per game	Robustness/sensitivity support	Reduces dependence on one learned evaluator	Does not prove all evaluators benefit equally
Head-to-head with and without completion	Ablation-style comparison	Isolates completion as the changed mechanism	Reported only for UBFM, not all algorithms in the generalized class
10 seconds per action	Implementation realism	Puts the comparison near competition-like search budgets	Does not measure very low-latency production constraints
Stratified bootstrap confidence interval	Statistical support	Gives uncertainty around aggregate performance	Does not replace domain-specific performance testing

The experiment is best read as practical validation of the mechanism, not as a universal ranking of every search algorithm. The formal theorem says completion gives finite-time completeness to the defined class. The benchmark says that, at least for UBFM under this protocol, completion is not merely a proof ornament. It improves play.

The business lesson is state resolution, not board-game nostalgia

The obvious business misuse of this paper would be to say: “Agentic AI needs completion, therefore this board-game method applies to enterprise workflows.” No. That would be a category error wearing a strategy slide.

The paper directly concerns finite, deterministic, zero-sum, two-player, perfect-information games. Enterprise processes often involve stochastic environments, hidden information, shifting objectives, multiple stakeholders, open-ended tasks, and incomplete state representation. Financial markets, especially, are not chessboards with better branding.

Still, the paper gives a valuable architectural lesson for AI systems that plan, search, or execute multi-step decisions: do not mix estimated preference with resolved status.

Many AI workflow systems have a version of the same problem. They rank actions using model confidence, heuristic scoring, retrieved evidence, tool outputs, or historical success. But they often fail to represent whether a subtask is actually resolved, merely estimated, blocked, contradicted, or awaiting verification. Once these states collapse into one generic score, the system may over-select familiar safe paths and under-explore uncertain but valuable alternatives.

Cognaptus inference, not the paper’s direct claim: enterprise agents should carry explicit state-resolution metadata.

That metadata might include:

Agent state marker	Business analogue	Why it matters
Estimated	The model believes this is likely correct	Useful for prioritization, unsafe as final proof
Verified	A tool, rule, human, or database confirms it	Can be propagated into downstream decisions
Contradicted	Evidence conflicts with the current path	Should trigger repair or escalation
Exhausted	All relevant options under a branch have been checked	Supports safe termination
Unresolved	Still open despite low heuristic score	Should not disappear merely because it looks unattractive

The ROI relevance is not “completion will make agents smarter.” The better claim is narrower: explicit resolution tracking can reduce silent failure modes in planning systems. It can make debugging easier, termination conditions clearer, and risk controls more auditable. Less glamorous than “autonomous intelligence,” yes. Also more useful.

What the paper directly shows, and what it does not

A disciplined reading separates three layers.

Directly shown: completion makes Unbounded Minimax-based algorithms complete in finite time for finite perfect-information two-player games, under the paper’s formal definitions. The paper also directly shows that completion improves UBFM’s empirical winning performance in the reported 22-game benchmark.

Reasonable business inference: planning systems can benefit from separating heuristic estimates from exact or verified resolution status. If an AI system needs to know when a branch is solved, blocked, exhausted, or still open, that status should be represented explicitly rather than buried inside a scalar score.

Still uncertain: how to transfer this mechanism to stochastic, partially observable, multi-agent, or open-ended business environments. In those settings, “resolved” may not mean exact minimax value. It may mean verified compliance, sufficient evidence, bounded risk, human approval, or probabilistic confidence above a threshold. The paper does not solve that translation problem. It gives a clean example of why the translation is worth attempting carefully.

The important limitation is not that the paper is “only about games.” Games are often where decision theory gets stripped down enough to see the machinery. The limitation is that the exact guarantee depends on assumptions that many business systems do not satisfy.

Completeness is a design property, not a mood

The paper’s quiet contribution is to demote a common engineering fantasy: that enough search time will eventually compensate for poor state semantics.

It will not.

UBFM and Descent were already powerful search ideas. They could explore non-uniformly, go deep where promising, and fit into knowledge-free reinforcement learning systems. But without completion, they could still fail to determine a winning strategy because they did not properly manage the difference between known outcomes and estimated values.

Completion fixes that by adding a small but decisive layer of structure: completion value, heuristic value, and resolution flag. From there, the paper proves finite-time completeness for a broader class of algorithms and shows that the mechanism improves practical UBFM performance in a substantial benchmark.

For game AI, the lesson is direct: if the endgame can be solved, the search procedure should be architected to solve it.

For business AI, the lesson is more general and more uncomfortable: an agent that cannot represent what is actually resolved will eventually confuse motion with progress. It may search longer, call more tools, produce better-formatted plans, and still miss the branch that mattered.

Finishing is not a side effect of searching. It has to be designed.

Cognaptus: Automate the Present, Incubate the Future.

Quentin Cohen-Solal, “Completeness of Unbounded Best-First Minimax and Descent Minimax,” arXiv:2603.24572v1, 2026. ↩︎

The algorithm did not lose because it was shallow#

The failure mode: local certainty can block global search#

Completion separates “known”, “estimated”, and “resolved”#

The proof is a progress argument, not a magic convergence slogan#

UBFM and Descent are two exploration personalities inside the same family#

The experiment tests whether correctness also buys play strength#

The business lesson is state resolution, not board-game nostalgia#

What the paper directly shows, and what it does not#

Completeness is a design property, not a mood#