Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in PPO - Performance do not match gSDE paper #52

Closed
araffin opened this issue Jun 10, 2020 · 0 comments · Fixed by #53
Closed

Bug in PPO - Performance do not match gSDE paper #52

araffin opened this issue Jun 10, 2020 · 0 comments · Fixed by #53
Assignees
Labels
bug Something isn't working

Comments

@araffin
Copy link
Member

araffin commented Jun 10, 2020

The issue may come from this commit (distribution refactoring): fdecd51

The difference between working and not working code: ba18258...fdecd51

Currently inspecting the commit but help is welcomed ;)

Related to #49 #48

v0.3.0 is working, v0.4.0 has the bug.

Perf PPO on HalfCheetah using the rl zoo:
python train.py --algo ppo --env HalfCheetahBulletEnv-v0 --eval-freq 50000 --seed 2682960776

Pybullet: 2.6.5 (should work with 2.7.1 too)
Gym: 0.17.1
PyTorch: 1.5.0

Seed: 2682960776 - cpu

SB3 24 february (working version) - gSDE Paper version

SB3: f1a4fa2
RL zoo: abf8fcd

Eval num_timesteps=49985, episode_reward=-1162.74 +/- 39.43
Eval num_timesteps=99985, episode_reward=-1206.28 +/- 48.66
Eval num_timesteps=149985, episode_reward=-1167.46 +/- 29.00
Eval num_timesteps=199985, episode_reward=871.04 +/- 8.67
Eval num_timesteps=249985, episode_reward=552.29 +/- 918.83
Eval num_timesteps=299985, episode_reward=1302.70 +/- 29.44
Eval num_timesteps=349985, episode_reward=1459.13 +/- 96.28
Eval num_timesteps=399985, episode_reward=1225.08 +/- 593.00
Eval num_timesteps=449985, episode_reward=1966.47 +/- 55.37
| time_elapsed | 745 |

v0.3.0 (working)

Eval num_timesteps=49985, episode_reward=-1251.36 +/- 73.55
Eval num_timesteps=99985, episode_reward=-1335.98 +/- 8.61
Eval num_timesteps=149985, episode_reward=722.82 +/- 32.92
Eval num_timesteps=199985, episode_reward=789.23 +/- 41.66
Eval num_timesteps=249985, episode_reward=884.63 +/- 12.12
Eval num_timesteps=299985, episode_reward=1128.64 +/- 27.48
Eval num_timesteps=349985, episode_reward=1326.70 +/- 80.14
Eval num_timesteps=399985, episode_reward=1528.11 +/- 52.68
| time_elapsed | 662 |
Eval num_timesteps=449985, episode_reward=1626.98 +/- 75.79

23 March - Remove CEMRL (working)

SB3: 4b2092f
RL zoo: 4c685669169b212a
Eval num_timesteps=49985, episode_reward=-1114.12 +/- 278.50
Eval num_timesteps=99985, episode_reward=-1159.72 +/- 28.42
Eval num_timesteps=149985, episode_reward=-1076.83 +/- 194.04
Eval num_timesteps=199985, episode_reward=-395.34 +/- 711.05
Eval num_timesteps=249985, episode_reward=-46.66 +/- 384.98
Eval num_timesteps=299985, episode_reward=993.80 +/- 403.25
Eval num_timesteps=349985, episode_reward=1534.85 +/- 18.85

### 23 March - Change pre-processing (working)
SB3: ba18258
RL zoo: 8b71eddc7561b26

Eval num_timesteps=49985, episode_reward=-1294.17 +/- 71.03
Eval num_timesteps=99985, episode_reward=-1047.79 +/- 107.26
Eval num_timesteps=149985, episode_reward=-509.42 +/- 736.42
Eval num_timesteps=199985, episode_reward=491.34 +/- 37.18
Eval num_timesteps=249985, episode_reward=929.41 +/- 55.70
Eval num_timesteps=299985, episode_reward=922.89 +/- 52.07
Eval num_timesteps=349985, episode_reward=1161.23 +/- 71.37

31 March - Refactor Action Distribution v0.4.0a3 (not working)

SB3: fdecd51
RL zoo: 8b71eddc7561b26

Eval num_timesteps=49985, episode_reward=-1254.91 +/- 96.88
Eval num_timesteps=99985, episode_reward=-1139.13 +/- 175.29
Eval num_timesteps=149985, episode_reward=-608.69 +/- 658.00
Eval num_timesteps=199985, episode_reward=334.35 +/- 363.71
Eval num_timesteps=249985, episode_reward=-283.72 +/- 485.11
Eval num_timesteps=299985, episode_reward=-44.18 +/- 84.45
Eval num_timesteps=349985, episode_reward=192.63 +/- 19.71
Eval num_timesteps=399985, episode_reward=292.71 +/- 177.96
| time_elapsed | 683 |

v0.4.0 (not working)

Eval num_timesteps=49985, episode_reward=-1335.59 +/- 38.26
Eval num_timesteps=99985, episode_reward=-717.95 +/- 415.52
Eval num_timesteps=149985, episode_reward=-555.61 +/- 99.95
Eval num_timesteps=199985, episode_reward=-1091.25 +/- 37.42
Eval num_timesteps=249985, episode_reward=-741.49 +/- 92.98
Eval num_timesteps=299985, episode_reward=-139.83 +/- 60.00
Eval num_timesteps=349985, episode_reward=10.24 +/- 306.05
Eval num_timesteps=399985, episode_reward=554.69 +/- 13.50
Eval num_timesteps=449985, episode_reward=634.15 +/- 12.41
Eval num_timesteps=499985, episode_reward=721.88 +/- 13.24
| time_elapsed | 696 |

v0.5.0 (not working)

Eval num_timesteps=49985, episode_reward=-1234.57 +/- 76.73
Eval num_timesteps=99985, episode_reward=-1102.34 +/- 103.49
Eval num_timesteps=149985, episode_reward=-948.12 +/- 138.89
Eval num_timesteps=199985, episode_reward=483.17 +/- 90.65
Eval num_timesteps=249985, episode_reward=609.21 +/- 14.94
Eval num_timesteps=299985, episode_reward=651.08 +/- 24.13
Eval num_timesteps=349985, episode_reward=497.35 +/- 335.62
| time_elapsed | 591 |
Eval num_timesteps=399985, episode_reward=524.58 +/- 313.33
| time_elapsed | 677 |

Hyperparameters:

HalfCheetahBulletEnv-v0:
  env_wrapper: utils.wrappers.TimeFeatureWrapper
  normalize: true
  n_envs: 16
  n_timesteps: !!float 2e6
  policy: 'MlpPolicy'
  batch_size: 128
  n_steps: 512
  gamma: 0.99
  gae_lambda: 0.9
  n_epochs: 20
  ent_coef: 0.0
  sde_sample_freq: 4
  max_grad_norm: 0.5
  vf_coef: 0.5
  learning_rate: !!float 3e-5
  use_sde: True
  clip_range: 0.4
  policy_kwargs: "dict(log_std_init=-2,
                       ortho_init=False,
                       activation_fn=nn.ReLU,
                       net_arch=[dict(pi=[256, 256], vf=[256, 256])]
                       )"
@araffin araffin added the bug Something isn't working label Jun 10, 2020
@araffin araffin self-assigned this Jun 10, 2020
@araffin araffin mentioned this issue Jun 10, 2020
12 tasks
Shunian-Chen pushed a commit to Shunian-Chen/AIPI530 that referenced this issue Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant