[Feature request] Adding multiprocessing support for off policy algorithms #179

yonkshi · 2020-10-06T15:42:27Z

I am in the process of adding multiprocessing(vectorized envs) support for off-policy algorithms (TD3, SAC, DDPG etc), I've added support for sampling multiple actions and updates timesteps appropriately to the number of vectorized environments. The modified code can run without throwing an error, but the algorithms don't really converge anymore.
I tried on OpenAI Gym's Pendulum-v0, where single instance envs made from make_vec_env('Pendulum-v0', n_envs = 1, vec_env_cls=DummyVecEnv) trains fine. If I specify multiple instances such as make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=DummyVecEnv) or make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=SubprocVecEnv), then the algorithms don't converge at all.
Here's a warning message that I get, which I suspect is closely related to the non-convergence.

/home/me/code/stable-baselines3/stable_baselines3/sac/sac.py:237: UserWarning: Using a target size (torch.Size([256, 2])) that is different
to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.                                                                                                                                             critic_loss = 0.5 * sum([F.mse_loss(current_q, q_backup) for current_q in current_q_estimates])

It appears to me that the replay buffer wasn't not retrieving n_envs samples thus the loss target had to rely on broadcasting

Some pointers on modifying the replay buffer so it would support multiprocessing would be much appreciated! If the authors would like, I can create a PR

yonkshi@1579713

The text was updated successfully, but these errors were encountered:

Miffyli · 2020-10-06T17:48:28Z

On a quick glimpse I think you need to over the experiences from each environment here and store each experience one by one. I am not sure how the current code works and manages to store experiences for all environments at once. Also taking "all" over dones is not right, because environment episodes can end at different times.

This could be added to contrib repository which we will make available soon.

araffin · 2020-10-09T17:03:30Z

This could be added to contrib repository which we will make available soon.

In fact, multiprocessing for all algorithms is a feature that I would like to have for v1.1+, not only for the contrib.
Most of the work will be in changing the replay buffer and adding checks (because some features are disabled for SAC/TD3 when n_envs>1, like training after n episodes).

nick-harder · 2021-04-19T07:40:29Z

I would be glad to also work on this feature. Is there any work going on in this direction? Whom can I talk to?

araffin · 2021-04-19T09:11:45Z

I would be glad to also work on this feature. Is there any work going on in this direction? Whom can I talk to?

There was apparently some work already done but far from what would be needed.
Anyway, if you work on that feature, please start with only one algorithm (for instance SAC) and create a draft PR so we can discuss the details there.

If you have any question before, you can ask them here ;)

yonkshi · 2021-04-19T16:14:54Z

@nick-harder My own research have pivoted away from RL / SB3 so I haven't made much progress since the previous commit. Please feel free to continue working on this feature. I agree with what Antonin said, start with one algorithm and try to verify that it converges properly.

araffin · 2021-05-17T14:05:50Z

I'll be working on that in the coming weeks (I need to implement it for a personal project)

prathameshpck · 2021-06-26T09:34:20Z

Hey I was actually implementing a project and this feature would help me immensely
Any progress with/ any estimate on its completion?

araffin · 2021-06-26T09:41:33Z

Hey I was actually implementing a project and this feature would help me immensely
Any progress with/ any estimate on its completion?

You can find a minimal working branch (yet unpolished) there: #439

I activated the feature for SAC only for now but it should work with the other algorithms.
It also only work in the basic case onyl for now (no dict obs, no HER replay).

prathameshpck · 2021-06-26T09:50:28Z

Thanks a lot
Ill try to look into it
I'm fairly new to SB3 and its a pretty big learning curve to contrib immediately
Understanding the codebase could take a while

araffin · 2021-11-04T11:32:50Z

I updated my current implementation, remove some for loops and added support for dict obs.

araffin · 2021-11-06T21:51:28Z

I added experimental support for multi env with HER replay buffer in #654

Kittiwin-Kumlungmak · 2022-03-03T10:40:45Z

Hello,

Dose DDPG support multiprocessing in the latest version of SB3? v1.4.0?

I'm just curious because I tried multiprocessing on my custom env and get this assertion:

"AssertionError: You must use only one env when doing episodic training."

Also, I found out that new features such as EvalCallback with StopTrainingOnNoModelImprovement don't exist in the library installed in my environment. Even though I could find the code in the repo.

I tried pip uninstall and pip install again. The code still was not updated.

Was that because I used pip to install? Is there a solution for this problem?

Thank you very much in advance.
I'm new to RL but your work makes RL much simpler than I thought. Still not easy though.

araffin · 2022-03-03T10:45:54Z

"AssertionError: You must use only one env when doing episodic training."

Please read the documentation, you are using train_freq=(1, "episode") (episodic training), to use mutliple env, you must use "step" as the unit (or ŧrain_freq=1 for short).
We recommend you to use TD3/SAC anyway (improved versions of DDPG).

Also, I found out that new features such as EvalCallback with StopTrainingOnNoModelImprovement don't exist in the library installed in my environment. Even though I could find the code in the repo.

You need to install master version (cf. doc) as it is not yet released.

Miffyli added enhancement New feature or request experimental Experimental Feature labels Oct 6, 2020

Miffyli added this to the v1.1 milestone Oct 11, 2020

araffin mentioned this issue Oct 24, 2020

[feature request] Add Multi-Processing / VecEnviroment support for SAC and DDPG hill-a/stable-baselines#170

Closed

This was referenced Nov 16, 2020

[Question] Experience Replay, DQN Multiprocessing & Model Improvements #228

Closed

(Probably) Unfair Speed Comparison thu-ml/tianshou#249

Closed

araffin mentioned this issue Nov 24, 2020

Roadmap to Stable-Baselines3 V1.0 #1

Closed

42 tasks

araffin pinned this issue Mar 5, 2021

araffin mentioned this issue Apr 15, 2021

[Question] DQN doesn't support vectorize environment #395

Closed

2 tasks

araffin self-assigned this May 17, 2021

This was referenced May 17, 2021

Multiprocessing support for off policy algorithms #439

Merged

Training from scratch takes too much time (in Atari envs for 10M steps) [question] DLR-RM/rl-baselines3-zoo#95

Closed

araffin mentioned this issue Jun 2, 2021

[Bug] Inconsistencies with EvalCallback tensorboard logs. Logs are written at timesteps different from the ones at which were recorded.. #457

Closed

3 tasks

araffin mentioned this issue Jul 12, 2021

Pybullet SubprocVecEnv Multiprocessing leads to Broken Pipe Error #509

Closed

benblack769 mentioned this issue Sep 4, 2021

Some errors appear when training DQN model in PettingZoo using stable-baselines3 Farama-Foundation/PettingZoo#467

Closed

araffin mentioned this issue Nov 4, 2021

[Question] Regarding implementation of multi env off-policy algorithm (DQN, Replaybuffer) #567

Closed

2 tasks

araffin changed the title ~~[Feature request / WIP] Adding multiprocessing support for off policy algorithms~~ [Feature request] Adding multiprocessing support for off policy algorithms Nov 4, 2021

qgallouedec mentioned this issue Nov 4, 2021

Restrict the panda robot to only move in the front qgallouedec/panda-gym#14

Closed

araffin mentioned this issue Nov 5, 2021

Multiprocessing DLR-RM/rl-baselines3-zoo#184

Closed

araffin closed this as completed in #439 Dec 1, 2021

araffin unpinned this issue Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Adding multiprocessing support for off policy algorithms #179

[Feature request] Adding multiprocessing support for off policy algorithms #179

yonkshi commented Oct 6, 2020 •

edited

Loading

Miffyli commented Oct 6, 2020

araffin commented Oct 9, 2020

nick-harder commented Apr 19, 2021

araffin commented Apr 19, 2021

yonkshi commented Apr 19, 2021

araffin commented May 17, 2021

prathameshpck commented Jun 26, 2021

araffin commented Jun 26, 2021

prathameshpck commented Jun 26, 2021

araffin commented Nov 4, 2021

araffin commented Nov 6, 2021

Kittiwin-Kumlungmak commented Mar 3, 2022

araffin commented Mar 3, 2022

[Feature request] Adding multiprocessing support for off policy algorithms #179

[Feature request] Adding multiprocessing support for off policy algorithms #179

Comments

yonkshi commented Oct 6, 2020 • edited Loading

Miffyli commented Oct 6, 2020

araffin commented Oct 9, 2020

nick-harder commented Apr 19, 2021

araffin commented Apr 19, 2021

yonkshi commented Apr 19, 2021

araffin commented May 17, 2021

prathameshpck commented Jun 26, 2021

araffin commented Jun 26, 2021

prathameshpck commented Jun 26, 2021

araffin commented Nov 4, 2021

araffin commented Nov 6, 2021

Kittiwin-Kumlungmak commented Mar 3, 2022

araffin commented Mar 3, 2022

yonkshi commented Oct 6, 2020 •

edited

Loading