Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft Actor-Critic #398

Merged
merged 37 commits into from
Aug 4, 2019
Merged

Soft Actor-Critic #398

merged 37 commits into from
Aug 4, 2019

Conversation

kengz
Copy link
Owner

@kengz kengz commented Aug 2, 2019

Feature / Fix

Roboschool (continuous control) Benchmark

Note that the Roboschool reward scales are different from MuJoCo's.

Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt 1153.87
graph
RoboschoolHalfCheetah 1204.68
graph
RoboschoolHopper 1161.24
graph
RoboschoolWalker2d 695.36
graph

LunarLander (discrete control) Benchmark

sac_lunar_t0_trial_graph_mean_returns_vs_frames sac_lunar_t0_trial_graph_mean_returns_ma_vs_frames
Trial graph Moving average

slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Outdated Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Outdated Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
slm_lab/agent/algorithm/sac.py Show resolved Hide resolved
@CarloLucibello
Copy link

Hi, just wanted to point out, the follow-up paper by the authors of SAC
https://arxiv.org/abs/1812.05905
One of the main differences with the original paper is that they don't use a separate V network

@kengz
Copy link
Owner Author

kengz commented Aug 4, 2019

Hi, just wanted to point out, the follow-up paper by the authors of SAC
https://arxiv.org/abs/1812.05905
One of the main differences with the original paper is that they don't use a separate V network

@CarloLucibello Thanks for pointing that out. This PR implements the first version of SAC and adds a discrete control version. I'll implement the improved version in the next PR.

@kengz kengz merged commit 85a2f39 into master Aug 4, 2019
@kengz kengz deleted the sac branch August 4, 2019 21:35
@kengz kengz added the result experiment result upload label Aug 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
result experiment result upload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants