Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Add Multi-Processing / VecEnviroment support for SAC and DDPG #170

Closed
Sohojoe opened this issue Jan 21, 2019 · 3 comments
Labels
enhancement New feature or request v3 Discussion about V3

Comments

@Sohojoe
Copy link

Sohojoe commented Jan 21, 2019

Thanks for adding SAC! I've been able to get it running in
https://github.com/Sohojoe/MarathonEnvsBaselines

The sample efficiency looks promising, however, the wall clock training time is poor compared to PPO2 due to the lack of support for Multi-Processing / VecEnviroment.

Unity / ML-Agents really shines when using multiple instances of an agent - this is a much better approach than MPI / asking the researcher to manage threads. Training with 16 agents gives a 10x speed increase over a single agent and I've pushed some environments to 64 - 100 agents.

It would also be great to have VecEnviroment support for DDPG for techniques that require offline learning.

Would it be relatively easy to add proper VecEnviroment support to SAC and DDPG? I would be happy to spend some cycles against this

@araffin araffin added the enhancement New feature or request label Jan 21, 2019
@araffin
Copy link
Collaborator

araffin commented Jan 21, 2019

Hello,

The sample efficiency looks promising, however, the wall clock training time is poor compared to PPO2 due to the lack of support for Multi-Processing / VecEnviroment.

Yes, SAC were designed to be applied on real robots, were multiprocessing is not really possible.

It would also be great to have VecEnviroment support for DDPG for techniques that require offline learning.
Would it be relatively easy to add proper VecEnviroment support to SAC and DDPG? I would be happy to spend some cycles against this

I think current DDPG version supports MPI (but I did not try it yet). Also, multiprocessing training would change the original algorithms, so I would be careful doing so.
In fact, I think this is called D4PG and was a research paper in itself (you should also look at Distributed Prioritized Experience Replay. As SAC is quite new, I'm not aware of any distributed version yet, but maybe the previous techniques can be adapted.

I did not have the time to take a deeper look at those paper so I don't know how easy it is to implement...
Personally, I would be interested to have at least one off-policy learning method that supports multiprocessing.
Pinging @hill-a and @erniejunior ;)

@araffin
Copy link
Collaborator

araffin commented Jan 22, 2019

Also related: rail-berkeley/softlearning#8

@araffin
Copy link
Collaborator

araffin commented Oct 24, 2020

closing this in favor of DLR-RM/stable-baselines3#179

@araffin araffin closed this as completed Oct 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v3 Discussion about V3
Projects
None yet
Development

No branches or pull requests

2 participants