You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, the MCTS player takes raw value/policy priors (P) and refines them through MCTS to get P'. The neural network is essentially tasked with predicting |P' - P|. In other words, its tasked with predicting how confident it is of its output. During MCTS, the backpropagated playout values are scaled by this confidence value.
David Wu claims that this, combined with #37, results in a 50 ELO improvement in KataGo.
Implement this, and validate its value through experiments.
The text was updated successfully, but these errors were encountered:
Used by KataGo, described here.
Basically, the MCTS player takes raw value/policy priors (P) and refines them through MCTS to get P'. The neural network is essentially tasked with predicting |P' - P|. In other words, its tasked with predicting how confident it is of its output. During MCTS, the backpropagated playout values are scaled by this confidence value.
David Wu claims that this, combined with #37, results in a 50 ELO improvement in KataGo.
Implement this, and validate its value through experiments.
The text was updated successfully, but these errors were encountered: