-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Adding multiprocessing support for off policy algorithms #179
Comments
On a quick glimpse I think you need to over the experiences from each environment here and store each experience one by one. I am not sure how the current code works and manages to store experiences for all environments at once. Also taking "all" over dones is not right, because environment episodes can end at different times. This could be added to contrib repository which we will make available soon. |
In fact, multiprocessing for all algorithms is a feature that I would like to have for v1.1+, not only for the contrib. |
I would be glad to also work on this feature. Is there any work going on in this direction? Whom can I talk to? |
There was apparently some work already done but far from what would be needed. If you have any question before, you can ask them here ;) |
@nick-harder My own research have pivoted away from RL / SB3 so I haven't made much progress since the previous commit. Please feel free to continue working on this feature. I agree with what Antonin said, start with one algorithm and try to verify that it converges properly. |
I'll be working on that in the coming weeks (I need to implement it for a personal project) |
Hey I was actually implementing a project and this feature would help me immensely |
You can find a minimal working branch (yet unpolished) there: #439 I activated the feature for SAC only for now but it should work with the other algorithms. |
Thanks a lot |
I updated my current implementation, remove some for loops and added support for dict obs. |
I added experimental support for multi env with HER replay buffer in #654 |
Hello, Dose DDPG support multiprocessing in the latest version of SB3? v1.4.0? I'm just curious because I tried multiprocessing on my custom env and get this assertion: "AssertionError: You must use only one env when doing episodic training." Also, I found out that new features such as EvalCallback with StopTrainingOnNoModelImprovement don't exist in the library installed in my environment. Even though I could find the code in the repo. I tried pip uninstall and pip install again. The code still was not updated. Was that because I used pip to install? Is there a solution for this problem? Thank you very much in advance. |
Please read the documentation, you are using
You need to install master version (cf. doc) as it is not yet released. |
I am in the process of adding multiprocessing(vectorized envs) support for off-policy algorithms (TD3, SAC, DDPG etc), I've added support for sampling multiple actions and updates timesteps appropriately to the number of vectorized environments. The modified code can run without throwing an error, but the algorithms don't really converge anymore.
I tried on OpenAI Gym's Pendulum-v0, where single instance envs made from
make_vec_env('Pendulum-v0', n_envs = 1, vec_env_cls=DummyVecEnv)
trains fine. If I specify multiple instances such asmake_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=DummyVecEnv)
ormake_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=SubprocVecEnv)
, then the algorithms don't converge at all.Here's a warning message that I get, which I suspect is closely related to the non-convergence.
It appears to me that the replay buffer wasn't not retrieving
n_envs
samples thus the loss target had to rely on broadcastingSome pointers on modifying the replay buffer so it would support multiprocessing would be much appreciated! If the authors would like, I can create a PR
yonkshi@1579713
The text was updated successfully, but these errors were encountered: