Alphazero

TL;DR for operators Search-contempt is not “AI plays worse so it learns more”. That would be the lazy interpretation, and business strategy already has enough lazy interpretations wearing expensive shoes. The paper introduces a hybrid MCTS method for AlphaZero-like self-play systems. It behaves like standard PUCT search for the player to move, but at opponent nodes it eventually freezes the opponent’s visit distribution after a threshold, $N_{scl}$, and samples from that frozen distribution rather than constantly updating it toward stronger play.1 The effect is subtle but important: the system stops assuming the opponent will always improve its response with more search. ...