Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion between "battle_mode" and "mcts_mode" #152

Closed
marintoro opened this issue Nov 27, 2023 · 2 comments
Closed

Confusion between "battle_mode" and "mcts_mode" #152

marintoro opened this issue Nov 27, 2023 · 2 comments
Labels
config New or improved configuration discussion Discussion of a typical issue or concept

Comments

@marintoro
Copy link

Hello,

I think there is a "bug" in the actual version of the alphago code when using mode "play_with_bot_mode".
Indeed in both tictactoe_env.py and gomoku_env.py there is this line hardcoded:

self.mcts_mode = 'self_play_mode'

So mcts_mode is always set to self_play_mode, no matter what is giving inside the config.
Moreover in both python tree and C++ tree of alphago we can found those lines:

self.simulate_env.battle_mode = self.simulate_env.mcts_mode # In ptree_az.py
simulate_env.attr("battle_mode") = simulate_env.attr("mcts_mode"); # In mcts_alphazero.cpp

So that means that no matter what we give in config for battle_mode, this is overrided with the mcts_mode which is always "self_play_mode"...

In conclusion, after reviewed quickly the code, I think that mcts_mode should just be removed and replaced by battle_mode everywhere because both attributes seems to make the exact same things (but I may be wrong).

To reproduce you can just run the standard tictactoe in 'play_with_bot_mode' (by running tictactoe_alphazero_bot_mode_config.py) and check that the mcts is always using "self_play_mode".

@puyuan1996
Copy link
Collaborator

puyuan1996 commented Nov 29, 2023

Thank you very much for your thoughtful feedback.

  • We acknowledge your point, but it's indeed necessary to consistently set simulate_env.battle_mode to self_play_mode. This is because regardless of how we interact with the true environment during the data collection phase (i.e., whatever the battle_mode setting in the real environment), we should not give the agent access to the opponent's policy when executing the MCTS search. Therefore, during the MCTS search process, simulate_env.battle_mode is always set to self_play_mode.

  • However, this could potentially lead to some confusion. Our self.mcts_mode might need to be renamed to self.battle_mode_in_simulation_env to more accurately reflect its role in the simulation environment. It is worth noting that we have not hardcoded a fixed value, but instead left this parameter reserved for debugging purposes.

  • For relevant information, you may refer to this issue.

If you have any suggestions for improvement, please feel free to provide them. Best wishes!

@puyuan1996 puyuan1996 added config New or improved configuration discussion Discussion of a typical issue or concept labels Nov 29, 2023
@marintoro
Copy link
Author

Hello,
Ok sorry for my missunderstanding.
Thank you for your explanation I understand now why you always set battle_mode to self_play in the MCTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config New or improved configuration discussion Discussion of a typical issue or concept
Projects
None yet
Development

No branches or pull requests

3 participants