You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for example when training tic tac toe, is the optimum reached when win rate == 0.50? my win rate is so far always above 0.50. i havent used the evaluate function yet because i feel like the win_rate printed after every epoch is already an evaluation?
The text was updated successfully, but these errors were encountered:
Quetzalcohuatl
changed the title
How do i know when ive reached an optimum
How do i know when ive reached an optimum while training
Mar 7, 2021
An opponent player in the evaluation phase is a random player in default.
I think the winning rate of a perfect player versus a random player is about 98% in Tic-Tac-Toe, because random players sometimes choose correct actions.
Generally speaking, "optimal" policy cannot be defined in multi-player games, while the maximum entropy Nash equilibrium is recognized as the representative policy.
YuriCat, can you add an arg in the train function that has evaluate to a different agent? Like i want to evaluate against my model from 20 epochs ago to see if it is improving or not. How can i do this ? I only see its supported in evaluate.py but not train.py
Thanks for your suggestion.
Selecting opponents is what we are considering right now.
Do you have any good idea to specify the old model in configuration?
By the way, comparing against a model just before the current model may give us an interesting result, since policies trained by RL are sometimes with in a loop like Rock-Scissors-Paper.
for example when training tic tac toe, is the optimum reached when win rate == 0.50? my win rate is so far always above 0.50. i havent used the evaluate function yet because i feel like the win_rate printed after every epoch is already an evaluation?
The text was updated successfully, but these errors were encountered: