the sampled efficient zero portion of the code #218

LiuDongyang39 · 2024-04-17T06:10:08Z

Hello，I was wondering if the sampled efficient zero portion of the code didn't use the empirical distribution from the original sampledmuzero paper to generate the prior probability of the child nodes?

puyuan1996 · 2024-04-18T04:03:11Z

Hello! We have implemented functionalities related to the empirical distribution as described in the original SampledMuZero paper. You can refer to our ptree and ctree codes for specific implementation details. Please note that since the K actions we sample are non-repetitive, the empirical distribution is essentially a re-normalization of the original probabilities (for discrete action spaces) or log probabilities (for continuous action spaces). As the original author's code is not open source, this implementation is based solely on our understanding. Additionally, following recent discussions on this issue, we plan to optimize the performance of sampled_efficientzero soon. Thank you for your patience and support.

LiuDongyang39 · 2024-04-18T04:23:59Z

Thank you very much for your detailed response and for sharing the specifics of the implementation. I appreciate the efforts you and your team are putting into the development of the SampledMuZero functionalities. Understanding that this implementation was crafted from the ground up, given the absence of open-source code from the original authors, I can certainly appreciate the complexity and innovation involved in your approach. I am also looking forward to the upcoming optimizations to the sampled_efficientzero algorithm. Please keep me updated on any new developments or insights that might arise as you continue to refine the implementation. At 2024-04-18 12:03:32, "蒲源" ***@***.***> wrote: Hello! We have implemented functionalities related to the empirical distribution as described in the original SampledMuZero paper. You can refer to our ptree and ctree codes for specific implementation details. Please note that since the K actions we sample are non-repetitive, the empirical distribution is essentially a re-normalization of the original probabilities (for discrete action spaces) or log probabilities (for continuous action spaces). As the original author's code is not open source, this implementation is based solely on our understanding. Additionally, following recent discussions on this issue, we plan to optimize the performance of sampled_efficientzero soon. Thank you for your patience and support. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

LiuDongyang39 changed the title ~~sampled efficient zero部分代码问题~~ the sampled efficient zero portion of the code Apr 17, 2024

puyuan1996 added discussion Discussion of a typical issue or concept enhancement New feature or request labels Apr 18, 2024

PaParaZz1 closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the sampled efficient zero portion of the code #218

the sampled efficient zero portion of the code #218

LiuDongyang39 commented Apr 17, 2024 •

edited

Loading

puyuan1996 commented Apr 18, 2024

LiuDongyang39 commented Apr 18, 2024 via email

the sampled efficient zero portion of the code #218

the sampled efficient zero portion of the code #218

Comments

LiuDongyang39 commented Apr 17, 2024 • edited Loading

puyuan1996 commented Apr 18, 2024

LiuDongyang39 commented Apr 18, 2024 via email

LiuDongyang39 commented Apr 17, 2024 •

edited

Loading