You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 1, 2024. It is now read-only.
Hi, Adam @adamlerer . After reading the great paper and code, I have some questions concerning the Single-Agent-Search(SAS).
1.
In the last paragraph describing SAS, it says that 1-ply search is used, which means that search depth = 1.
But in the code(Takes), it seems that when a bot is conducting search at a certain time step, it iteratively conducts search in the future MCSearch.
More specifically, assume we are running
Which further starts a copied search_server and calls
int score = search_server.runToCompletion();
to get the result of the MC rollout.
It seems that the two players in the search_server are still SearchBot and SmartBot(BP Bot), which is consistent with that fact claimed in the paper that "all agents play according to the joint blueprint policy for the remainder of the game". I wonder how do you deal with this problem or if I missed some important facts.
I found the MCSearch method mentioned in the paper is slightly different from the conventional MCTS. I suppose the biggest distinction is that in MCTS(as used in Alpha-Go), the action chosen in the selection step is based on the "Q value(which depends on a state value estimator )" and "reward(which is already presented in SPARTA)". May I ask why is the conventional idea not adopted here, or is this idea presented in the paper learned belief search?
Thanks for sharing the codebase and reading the above questions!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi, Adam @adamlerer . After reading the great paper and code, I have some questions concerning the
Single-Agent-Search(SAS)
.1.
In the last paragraph describing
SAS
, it says that1-ply
search is used, which means thatsearch depth = 1
.But in the code(Takes), it seems that when a bot is conducting search at a certain time step, it iteratively conducts search in the future MCSearch.
More specifically, assume we are running
and in file
SearchBot.cc
.In
Move SearchBot::doSearch_
, we are running search for a certain round. This is done by callingWhich further starts a copied
search_server
and callsto get the result of the MC rollout.
It seems that the two players in the
search_server
are stillSearchBot
andSmartBot(BP Bot)
, which is consistent with that fact claimed in the paper that "all agents play according to the joint blueprint policy for the remainder of the game". I wonder how do you deal with this problem or if I missed some important facts.I found the MCSearch method mentioned in the paper is slightly different from the conventional
MCTS
. I suppose the biggest distinction is that inMCTS(as used in Alpha-Go)
, the action chosen in the selection step is based on the "Q value(which depends on a state value estimator )" and "reward(which is already presented in SPARTA)". May I ask why is the conventional idea not adopted here, or is this idea presented in the paper learned belief search?Thanks for sharing the codebase and reading the above questions!
The text was updated successfully, but these errors were encountered: