Confusion between "battle_mode" and "mcts_mode" #152

marintoro · 2023-11-27T13:30:10Z

Hello,

I think there is a "bug" in the actual version of the alphago code when using mode "play_with_bot_mode".
Indeed in both tictactoe_env.py and gomoku_env.py there is this line hardcoded:

self.mcts_mode = 'self_play_mode'

So mcts_mode is always set to self_play_mode, no matter what is giving inside the config.
Moreover in both python tree and C++ tree of alphago we can found those lines:

self.simulate_env.battle_mode = self.simulate_env.mcts_mode # In ptree_az.py
simulate_env.attr("battle_mode") = simulate_env.attr("mcts_mode"); # In mcts_alphazero.cpp

So that means that no matter what we give in config for battle_mode, this is overrided with the mcts_mode which is always "self_play_mode"...

In conclusion, after reviewed quickly the code, I think that mcts_mode should just be removed and replaced by battle_mode everywhere because both attributes seems to make the exact same things (but I may be wrong).

To reproduce you can just run the standard tictactoe in 'play_with_bot_mode' (by running tictactoe_alphazero_bot_mode_config.py) and check that the mcts is always using "self_play_mode".

The text was updated successfully, but these errors were encountered:

puyuan1996 · 2023-11-29T08:02:46Z

Thank you very much for your thoughtful feedback.

We acknowledge your point, but it's indeed necessary to consistently set simulate_env.battle_mode to self_play_mode. This is because regardless of how we interact with the true environment during the data collection phase (i.e., whatever the battle_mode setting in the real environment), we should not give the agent access to the opponent's policy when executing the MCTS search. Therefore, during the MCTS search process, simulate_env.battle_mode is always set to self_play_mode.
However, this could potentially lead to some confusion. Our self.mcts_mode might need to be renamed to self.battle_mode_in_simulation_env to more accurately reflect its role in the simulation environment. It is worth noting that we have not hardcoded a fixed value, but instead left this parameter reserved for debugging purposes.
For relevant information, you may refer to this issue.

If you have any suggestions for improvement, please feel free to provide them. Best wishes!

marintoro · 2023-11-29T14:14:40Z

Hello,
Ok sorry for my missunderstanding.
Thank you for your explanation I understand now why you always set battle_mode to self_play in the MCTS.

puyuan1996 added config New or improved configuration discussion Discussion of a typical issue or concept labels Nov 29, 2023

PaParaZz1 closed this as completed Nov 29, 2023

puyuan1996 mentioned this issue Jan 4, 2024

polish(pu): rename mcts_mode to battle_mode_in_simulation_env, add sampled alphazero config for tictactoe #179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion between "battle_mode" and "mcts_mode" #152

Confusion between "battle_mode" and "mcts_mode" #152

marintoro commented Nov 27, 2023

puyuan1996 commented Nov 29, 2023 •

edited

Loading

marintoro commented Nov 29, 2023

Confusion between "battle_mode" and "mcts_mode" #152

Confusion between "battle_mode" and "mcts_mode" #152

Comments

marintoro commented Nov 27, 2023

puyuan1996 commented Nov 29, 2023 • edited Loading

marintoro commented Nov 29, 2023

puyuan1996 commented Nov 29, 2023 •

edited

Loading