-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBT issue #263
Comments
Hi! Can you please share your command line if possible? A simple PBT setup is a part of automatic tests on Github Actions, so it's tested regularly and should theoretically work! |
Okay, I managed to reproduce this. I'll fix it in the next release! Thank you for reporting! For a quick patch, add the following code to your parse_cfg function:
This will make sure that you're using AttrDict cfg. |
Thanks! Will check :)) |
It seems I can run now sf in PBT mode, but I still can't find any changes with learning. Juts to make sure:
So the question is - can I run internal PBT here with my single test batched env which will run experiment for 5mln steps and then periodically mutate params to learn another policy (ex if I set |
@alex-petrenko could you please help with my problem above? |
I believe you actually have a configuration that I never tried before, which is batched sampling + PBT. A little bit of background would help to explain this. Originally the current version of PBT was written for SF 1.0 which didn't have a batched sampling regime and didn't support vectorized simulators. The idea with PBT was that we can change the mapping between the specific agent in an environment and the policy that controls it, thus mixing many different agents/policies in multi-agent environments. The original (non-batched) sampling regime is perfect for this. See for example: https://github.com/alex-petrenko/sample-factory/blob/master/sample_factory/algo/utils/agent_policy_mapping.py With batched sampling mode the whole idea is that we process experience in big batches. Collect like 2K or 4K timesteps and directly pass it to the inference worker to calculate actions, and so on. This makes it very difficult to arbitrarily map these 2K or 4K agents to different policies - this kind of re-batching would be super slow and would defeat the whole purpose. Therefore currently batched mode only supports PBT on the level of workers, i.e. if you have num_workers = N then experience from these N rollout workers will be sent to N different policies participating in PBT. I haven't tested this regime very much but it should work. I'm not sure if you can fit many workers and policies on one 2060 though... so I'm not sure if this is even helpful. In my experience PBT is starting to be helpful from ~4 policies, but starts to shine at 8+. Let me know if this does it for you. Otherwise, maybe taking a look at external tools make sense in the meantime. I've also started to work on an "external" PBT implementation that would work with Sample Factory. Instead of being a part of the algorithm and tightly integrated with it, it would basically start N independent experiments and treat them as a population. This would not work with multi-agent envs where you want multiple policies competing in the same environment, but would work for single-agent envs, hyperparameter tuning and so on. |
I've received the following error, when run PBT on single batch env. But without PBT sf works fine. I've tried to fix this but then I receive next portion of errors so it looks like PBT not tested for some time. Could someone please check.
The text was updated successfully, but these errors were encountered: