You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sample efficiency looks promising, however, the wall clock training time is poor compared to PPO2 due to the lack of support for Multi-Processing / VecEnviroment.
Unity / ML-Agents really shines when using multiple instances of an agent - this is a much better approach than MPI / asking the researcher to manage threads. Training with 16 agents gives a 10x speed increase over a single agent and I've pushed some environments to 64 - 100 agents.
It would also be great to have VecEnviroment support for DDPG for techniques that require offline learning.
Would it be relatively easy to add proper VecEnviroment support to SAC and DDPG? I would be happy to spend some cycles against this
The text was updated successfully, but these errors were encountered:
The sample efficiency looks promising, however, the wall clock training time is poor compared to PPO2 due to the lack of support for Multi-Processing / VecEnviroment.
Yes, SAC were designed to be applied on real robots, were multiprocessing is not really possible.
It would also be great to have VecEnviroment support for DDPG for techniques that require offline learning.
Would it be relatively easy to add proper VecEnviroment support to SAC and DDPG? I would be happy to spend some cycles against this
I think current DDPG version supports MPI (but I did not try it yet). Also, multiprocessing training would change the original algorithms, so I would be careful doing so.
In fact, I think this is called D4PG and was a research paper in itself (you should also look at Distributed Prioritized Experience Replay. As SAC is quite new, I'm not aware of any distributed version yet, but maybe the previous techniques can be adapted.
I did not have the time to take a deeper look at those paper so I don't know how easy it is to implement...
Personally, I would be interested to have at least one off-policy learning method that supports multiprocessing.
Pinging @hill-a and @erniejunior ;)
Thanks for adding SAC! I've been able to get it running in
https://github.com/Sohojoe/MarathonEnvsBaselines
The sample efficiency looks promising, however, the wall clock training time is poor compared to PPO2 due to the lack of support for Multi-Processing / VecEnviroment.
Unity / ML-Agents really shines when using multiple instances of an agent - this is a much better approach than MPI / asking the researcher to manage threads. Training with 16 agents gives a 10x speed increase over a single agent and I've pushed some environments to 64 - 100 agents.
It would also be great to have VecEnviroment support for DDPG for techniques that require offline learning.
Would it be relatively easy to add proper VecEnviroment support to SAC and DDPG? I would be happy to spend some cycles against this
The text was updated successfully, but these errors were encountered: