The algorithm did not lose because it was shallow
Endgames are where polite uncertainty goes to die.
Early in a game, a search algorithm can afford approximation. The tree is huge, the clock is rude, and the best it can do is lean on an evaluation function that says, with the usual machine confidence, “this line looks promising.” Fine. Nobody expects omniscience on move three.
But late in the game, the demand changes. The tree is smaller. The outcome may be reachable. If a winning line exists, the algorithm should eventually find it. If a drawing line is the best possible result, it should know that too. At that point, “I searched many interesting branches” is not a strategy. It is a diary.
Quentin Cohen-Solal’s paper Completeness of Unbounded Best-First Minimax and Descent Minimax asks a narrow but important question: when Unbounded Best-First Minimax and Descent Minimax are given the completion technique, do they become complete search algorithms for finite, perfect-information, two-player games?1
The answer is yes. More precisely, the paper generalizes both algorithms into a broader class called Unbounded Minimax-based algorithms, then proves that every algorithm in that class is complete in finite time when it uses completion. The paper also reports that completion improves practical winning performance for Unbounded Best-First Minimax across 22 benchmark games, with an average score of 6.34% for completion versus -6.34% without it.
That sounds like a technical patch. It is not. It is a reminder that in decision systems, search depth and search correctness are different species. More compute can expand a flawed procedure. It does not automatically repair the procedure.
The failure mode: local certainty can block global search
The tempting reader misconception is simple: if an unbounded search algorithm can keep searching, then enough time should eventually make it find the winning strategy. That feels intuitive. It is also exactly where the trap sits.
Unbounded Best-First Minimax was designed to avoid the rigidity of fixed-depth minimax. Classic minimax searches to a preset depth and evaluates the leaves. Alpha-beta pruning makes that cheaper, but it does not remove the depth boundary. Unbounded Best-First Minimax instead explores the game tree non-uniformly, going deeper into lines that appear promising. Descent Minimax modifies this with a stronger depth-first bias, which is useful when endgame information should be propagated during learning.
So far, this sounds like progress: less mechanical depth control, more selective intelligence. Wonderful. Now for the small print.
In their basic forms, these algorithms can fail to be complete. The failure is not that they run out of time. The paper explicitly discusses cases where they may fail even with infinite search time. The reason is structural: without using resolution information correctly, the algorithm may keep preferring a resolved state over an unresolved state whose heuristic evaluation looks worse, even though that unresolved state may turn out to be better once actually explored.
This creates a strange computational conservatism. The algorithm clings to known outcomes because they are known, while uncertain alternatives remain under-explored because their estimates look unattractive. In a business setting, one might call this “process discipline.” In search, it is just how you miss the winning move while feeling very organized.
The core distinction is this:
| State type | What the algorithm knows | What can go wrong |
|---|---|---|
| Resolved state | Its game outcome is known | It can become over-favored simply because it is certain |
| Unresolved state | Its value is estimated | Its estimate may be too pessimistic, so it may never receive enough search |
| Root decision | Must compare both kinds | Certainty and estimation get mixed as if they were the same kind of evidence |
The paper’s central mechanism is designed to prevent exactly this mixing.
Completion separates “known”, “estimated”, and “resolved”
Completion gives each state in the partial search tree three values:
| Value | Meaning | Why it matters |
|---|---|---|
| $c(s)$ | Completion value | The exact minimax value with respect to the basic game outcome, if known; otherwise set to 0 |
| $v(s)$ | Heuristic evaluation | The estimated minimax value in the partial tree, using terminal and learned/adaptive evaluation functions |
| $r(s)$ | Resolution flag | Whether the state is weakly resolved |
The elegance is not in the notation. The elegance is in the separation.
A value of 0 in $c(s)$ alone is ambiguous: it may mean “draw” or “unknown.” The resolution flag $r(s)$ disambiguates that. The heuristic value $v(s)$ remains available for guiding search, but it is no longer confused with exact game resolution. The algorithm can therefore use estimates without pretending they are proof.
Two operations carry this bookkeeping into search behavior.
First, completed best action chooses among child states using the completion-aware tuple rather than a naked heuristic value. For max-player states and min-player states, the ordering is adapted so that the search respects both the exact completion value and the heuristic evaluation, with selection-count tie handling in the formal algorithm.
Second, backup resolution propagates solved status upward. A state can become resolved if the appropriate logical condition is satisfied: for example, a player has a resolved winning child, or all children are resolved. Once a state is resolved, its completion value should not wobble around like a quarterly KPI adjusted after the board meeting.
That stability is not cosmetic. It is the base of the proof.
The proof is a progress argument, not a magic convergence slogan
The paper’s formal result is easier to understand if we ignore the symbols for a moment and track the proof’s job.
It must show two things:
- When a state is marked as resolved, the completion value is actually the minimax value.
- Every state will be marked resolved after a finite number of iterations.
The first part prevents false certainty. The second prevents endless wandering.
The paper defines completeness for an Unbounded Minimax-based algorithm as follows: for any finite perfect-information two-player game and any state, there exists enough search time such that applying the algorithm returns a resolved state whose completion value equals the exact minimax value.
In compact form, the key correctness relationship is:
where $M(s)$ is the exact minimax value of state $s$.
The proof then proceeds through a chain of lemmas and propositions. Their roles are worth separating, because this is where the paper’s real contribution sits.
| Proof component | Likely purpose | What it supports | What it does not prove alone |
|---|---|---|---|
| Resolved states do not change | Stability condition | Once a state is solved, its value remains safe to use | Does not show all states will be solved |
| Resolution conditions | Logical validity | A state is resolved only under justified child-state conditions | Does not show search will reach those conditions |
| Resolved implies exact | Main correctness step | $c(s)$ equals the exact minimax value when $r(s)=1$ | Does not show finite-time completion |
| Each iteration adds progress | Termination support | Search either expands the tree or resolves another state | Depends on game finiteness |
| Finite resolution proposition | Completeness bridge | Every state eventually becomes resolved | Applies to the defined finite perfect-information setting |
This is a mechanism-first proof. It is not saying, “the algorithm usually converges in practice.” It is saying: in the specified game class, the algorithm cannot keep spinning forever without either adding an unresolved state or resolving a state; because the state space is finite, that progress must eventually finish.
The theorem follows: any Unbounded Minimax-based algorithm is complete.
Notice the generality. The result is not only about one hand-tuned version of UBFM or one implementation of Descent. The paper defines a class of algorithms that differ in how they decide whether to continue deeper along the current line of play or return toward the root, and how they prioritize child exploration. UBFM and Descent are two points in this broader design space. Completion is the invariant that makes the class complete.
UBFM and Descent are two exploration personalities inside the same family
The generalization matters because UBFM and Descent explore differently.
UBFM returns to the root after extending the selected line by one state. Descent continues down the best unresolved sequence until it reaches a terminal or resolved state. In simpler terms: UBFM behaves more like repeated best-first refinement, while Descent carries a stronger depth-first instinct.
The new Unbounded Minimax-based class abstracts over that choice. It allows different behaviors for:
| Design axis | What varies | Why it matters operationally |
|---|---|---|
| Deep exploration | Continue down the current line or return to reconsider from the root | Controls how quickly endgame information is reached versus how often alternatives are reconsidered |
| Breadth exploration | Which child to explore first | Controls search ordering and sensitivity to heuristic guidance |
| Completion handling | Track $c(s)$, $v(s)$, and $r(s)$ | Preserves correctness across different exploration orders |
This is useful because practical search algorithms are rarely pure textbook objects. Engineers modify exploration order, caching, tie-breaking, and rollout behavior. A proof tied to only one exact traversal routine can be fragile. A proof over a class is more architecture-relevant: it says which part of the design is essential for correctness and which parts are movable.
The essential part is not “always explore exactly like UBFM” or “always explore exactly like Descent.” The essential part is that the algorithm must maintain resolution-aware completion values and propagate them correctly.
The experiment tests whether correctness also buys play strength
The theoretical result answers the formal question. The experiment answers a more impatient engineering question: should anyone bother implementing completion?
The paper compares Unbounded Best-First Minimax with completion against Unbounded Best-First Minimax without completion across 22 deterministic, zero-sum, perfect-information games. These include games such as Amazons, Arimaa, Ataxx, Breakthrough, Chess, Go 9x9, Gomoku, Havannah, several Hex sizes, Othello, Santorini, Surakarta, Xiangqi, and others used in Computer Olympiad-style benchmarks.
The experimental setup is not a casual “we played a few games and vibes were positive” exercise. For each game, the paper uses six distinct sets of evaluation functions, each with roughly fifteen functions. The algorithms compete as both first and second player for pairs of evaluation functions, producing roughly 2,700 matches per game. Each algorithm receives 10 seconds of search time per action. Outcomes are scored as 1 for win, 0 for draw, and -1 for loss, with 95% confidence intervals reported.
The main result: completion improves the average score to 6.34%, while the non-completion version scores -6.34%.
That is a 12.68 percentage-point swing in the head-to-head scoring frame. Some games show small or zero measured differences. Others show large gains: Ataxx is reported at +55%, Othello 8x8 at +28%, Othello 10x10 at +24%, Santorini at +14%, and Xiangqi at +11%.
| Evidence item | Likely purpose | Interpretation | Boundary |
|---|---|---|---|
| 22-game benchmark | Main empirical evidence | Completion improves UBFM performance across a broad perfect-information game set | Still limited to this game family and experimental protocol |
| Six evaluation-function sets per game | Robustness/sensitivity support | Reduces dependence on one learned evaluator | Does not prove all evaluators benefit equally |
| Head-to-head with and without completion | Ablation-style comparison | Isolates completion as the changed mechanism | Reported only for UBFM, not all algorithms in the generalized class |
| 10 seconds per action | Implementation realism | Puts the comparison near competition-like search budgets | Does not measure very low-latency production constraints |
| Stratified bootstrap confidence interval | Statistical support | Gives uncertainty around aggregate performance | Does not replace domain-specific performance testing |
The experiment is best read as practical validation of the mechanism, not as a universal ranking of every search algorithm. The formal theorem says completion gives finite-time completeness to the defined class. The benchmark says that, at least for UBFM under this protocol, completion is not merely a proof ornament. It improves play.
The business lesson is state resolution, not board-game nostalgia
The obvious business misuse of this paper would be to say: “Agentic AI needs completion, therefore this board-game method applies to enterprise workflows.” No. That would be a category error wearing a strategy slide.
The paper directly concerns finite, deterministic, zero-sum, two-player, perfect-information games. Enterprise processes often involve stochastic environments, hidden information, shifting objectives, multiple stakeholders, open-ended tasks, and incomplete state representation. Financial markets, especially, are not chessboards with better branding.
Still, the paper gives a valuable architectural lesson for AI systems that plan, search, or execute multi-step decisions: do not mix estimated preference with resolved status.
Many AI workflow systems have a version of the same problem. They rank actions using model confidence, heuristic scoring, retrieved evidence, tool outputs, or historical success. But they often fail to represent whether a subtask is actually resolved, merely estimated, blocked, contradicted, or awaiting verification. Once these states collapse into one generic score, the system may over-select familiar safe paths and under-explore uncertain but valuable alternatives.
Cognaptus inference, not the paper’s direct claim: enterprise agents should carry explicit state-resolution metadata.
That metadata might include:
| Agent state marker | Business analogue | Why it matters |
|---|---|---|
| Estimated | The model believes this is likely correct | Useful for prioritization, unsafe as final proof |
| Verified | A tool, rule, human, or database confirms it | Can be propagated into downstream decisions |
| Contradicted | Evidence conflicts with the current path | Should trigger repair or escalation |
| Exhausted | All relevant options under a branch have been checked | Supports safe termination |
| Unresolved | Still open despite low heuristic score | Should not disappear merely because it looks unattractive |
The ROI relevance is not “completion will make agents smarter.” The better claim is narrower: explicit resolution tracking can reduce silent failure modes in planning systems. It can make debugging easier, termination conditions clearer, and risk controls more auditable. Less glamorous than “autonomous intelligence,” yes. Also more useful.
What the paper directly shows, and what it does not
A disciplined reading separates three layers.
Directly shown: completion makes Unbounded Minimax-based algorithms complete in finite time for finite perfect-information two-player games, under the paper’s formal definitions. The paper also directly shows that completion improves UBFM’s empirical winning performance in the reported 22-game benchmark.
Reasonable business inference: planning systems can benefit from separating heuristic estimates from exact or verified resolution status. If an AI system needs to know when a branch is solved, blocked, exhausted, or still open, that status should be represented explicitly rather than buried inside a scalar score.
Still uncertain: how to transfer this mechanism to stochastic, partially observable, multi-agent, or open-ended business environments. In those settings, “resolved” may not mean exact minimax value. It may mean verified compliance, sufficient evidence, bounded risk, human approval, or probabilistic confidence above a threshold. The paper does not solve that translation problem. It gives a clean example of why the translation is worth attempting carefully.
The important limitation is not that the paper is “only about games.” Games are often where decision theory gets stripped down enough to see the machinery. The limitation is that the exact guarantee depends on assumptions that many business systems do not satisfy.
Completeness is a design property, not a mood
The paper’s quiet contribution is to demote a common engineering fantasy: that enough search time will eventually compensate for poor state semantics.
It will not.
UBFM and Descent were already powerful search ideas. They could explore non-uniformly, go deep where promising, and fit into knowledge-free reinforcement learning systems. But without completion, they could still fail to determine a winning strategy because they did not properly manage the difference between known outcomes and estimated values.
Completion fixes that by adding a small but decisive layer of structure: completion value, heuristic value, and resolution flag. From there, the paper proves finite-time completeness for a broader class of algorithms and shows that the mechanism improves practical UBFM performance in a substantial benchmark.
For game AI, the lesson is direct: if the endgame can be solved, the search procedure should be architected to solve it.
For business AI, the lesson is more general and more uncomfortable: an agent that cannot represent what is actually resolved will eventually confuse motion with progress. It may search longer, call more tools, produce better-formatted plans, and still miss the branch that mattered.
Finishing is not a side effect of searching. It has to be designed.
Cognaptus: Automate the Present, Incubate the Future.
-
Quentin Cohen-Solal, “Completeness of Unbounded Best-First Minimax and Descent Minimax,” arXiv:2603.24572v1, 2026. ↩︎