Uncertainty-Weighted MCTS Playouts #38

shindavid · 2023-01-30T17:04:28Z

Used by KataGo, described here.

Basically, the MCTS player takes raw value/policy priors (P) and refines them through MCTS to get P'. The neural network is essentially tasked with predicting |P' - P|. In other words, its tasked with predicting how confident it is of its output. During MCTS, the backpropagated playout values are scaled by this confidence value.

David Wu claims that this, combined with #37, results in a 50 ELO improvement in KataGo.

Implement this, and validate its value through experiments.

shindavid mentioned this issue Jan 30, 2023

Dynamic Variance-Scaled cPUCT #37

Open

shindavid added KataGo replication task learning improvement labels Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncertainty-Weighted MCTS Playouts #38

Uncertainty-Weighted MCTS Playouts #38

shindavid commented Jan 30, 2023

Uncertainty-Weighted MCTS Playouts #38

Uncertainty-Weighted MCTS Playouts #38

Comments

shindavid commented Jan 30, 2023