We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The issue may come from this commit (distribution refactoring): fdecd51
The difference between working and not working code: ba18258...fdecd51
Currently inspecting the commit but help is welcomed ;)
Related to #49 #48
v0.3.0 is working, v0.4.0 has the bug.
Perf PPO on HalfCheetah using the rl zoo: python train.py --algo ppo --env HalfCheetahBulletEnv-v0 --eval-freq 50000 --seed 2682960776
python train.py --algo ppo --env HalfCheetahBulletEnv-v0 --eval-freq 50000 --seed 2682960776
Pybullet: 2.6.5 (should work with 2.7.1 too) Gym: 0.17.1 PyTorch: 1.5.0
Seed: 2682960776 - cpu
SB3: f1a4fa2 RL zoo: abf8fcd
Eval num_timesteps=49985, episode_reward=-1162.74 +/- 39.43 Eval num_timesteps=99985, episode_reward=-1206.28 +/- 48.66 Eval num_timesteps=149985, episode_reward=-1167.46 +/- 29.00 Eval num_timesteps=199985, episode_reward=871.04 +/- 8.67 Eval num_timesteps=249985, episode_reward=552.29 +/- 918.83 Eval num_timesteps=299985, episode_reward=1302.70 +/- 29.44 Eval num_timesteps=349985, episode_reward=1459.13 +/- 96.28 Eval num_timesteps=399985, episode_reward=1225.08 +/- 593.00 Eval num_timesteps=449985, episode_reward=1966.47 +/- 55.37 | time_elapsed | 745 |
Eval num_timesteps=49985, episode_reward=-1251.36 +/- 73.55 Eval num_timesteps=99985, episode_reward=-1335.98 +/- 8.61 Eval num_timesteps=149985, episode_reward=722.82 +/- 32.92 Eval num_timesteps=199985, episode_reward=789.23 +/- 41.66 Eval num_timesteps=249985, episode_reward=884.63 +/- 12.12 Eval num_timesteps=299985, episode_reward=1128.64 +/- 27.48 Eval num_timesteps=349985, episode_reward=1326.70 +/- 80.14 Eval num_timesteps=399985, episode_reward=1528.11 +/- 52.68 | time_elapsed | 662 | Eval num_timesteps=449985, episode_reward=1626.98 +/- 75.79
SB3: 4b2092f RL zoo: 4c685669169b212a Eval num_timesteps=49985, episode_reward=-1114.12 +/- 278.50 Eval num_timesteps=99985, episode_reward=-1159.72 +/- 28.42 Eval num_timesteps=149985, episode_reward=-1076.83 +/- 194.04 Eval num_timesteps=199985, episode_reward=-395.34 +/- 711.05 Eval num_timesteps=249985, episode_reward=-46.66 +/- 384.98 Eval num_timesteps=299985, episode_reward=993.80 +/- 403.25 Eval num_timesteps=349985, episode_reward=1534.85 +/- 18.85
### 23 March - Change pre-processing (working) SB3: ba18258 RL zoo: 8b71eddc7561b26
Eval num_timesteps=49985, episode_reward=-1294.17 +/- 71.03 Eval num_timesteps=99985, episode_reward=-1047.79 +/- 107.26 Eval num_timesteps=149985, episode_reward=-509.42 +/- 736.42 Eval num_timesteps=199985, episode_reward=491.34 +/- 37.18 Eval num_timesteps=249985, episode_reward=929.41 +/- 55.70 Eval num_timesteps=299985, episode_reward=922.89 +/- 52.07 Eval num_timesteps=349985, episode_reward=1161.23 +/- 71.37
SB3: fdecd51 RL zoo: 8b71eddc7561b26
Eval num_timesteps=49985, episode_reward=-1254.91 +/- 96.88 Eval num_timesteps=99985, episode_reward=-1139.13 +/- 175.29 Eval num_timesteps=149985, episode_reward=-608.69 +/- 658.00 Eval num_timesteps=199985, episode_reward=334.35 +/- 363.71 Eval num_timesteps=249985, episode_reward=-283.72 +/- 485.11 Eval num_timesteps=299985, episode_reward=-44.18 +/- 84.45 Eval num_timesteps=349985, episode_reward=192.63 +/- 19.71 Eval num_timesteps=399985, episode_reward=292.71 +/- 177.96 | time_elapsed | 683 |
Eval num_timesteps=49985, episode_reward=-1335.59 +/- 38.26 Eval num_timesteps=99985, episode_reward=-717.95 +/- 415.52 Eval num_timesteps=149985, episode_reward=-555.61 +/- 99.95 Eval num_timesteps=199985, episode_reward=-1091.25 +/- 37.42 Eval num_timesteps=249985, episode_reward=-741.49 +/- 92.98 Eval num_timesteps=299985, episode_reward=-139.83 +/- 60.00 Eval num_timesteps=349985, episode_reward=10.24 +/- 306.05 Eval num_timesteps=399985, episode_reward=554.69 +/- 13.50 Eval num_timesteps=449985, episode_reward=634.15 +/- 12.41 Eval num_timesteps=499985, episode_reward=721.88 +/- 13.24 | time_elapsed | 696 |
Eval num_timesteps=49985, episode_reward=-1234.57 +/- 76.73 Eval num_timesteps=99985, episode_reward=-1102.34 +/- 103.49 Eval num_timesteps=149985, episode_reward=-948.12 +/- 138.89 Eval num_timesteps=199985, episode_reward=483.17 +/- 90.65 Eval num_timesteps=249985, episode_reward=609.21 +/- 14.94 Eval num_timesteps=299985, episode_reward=651.08 +/- 24.13 Eval num_timesteps=349985, episode_reward=497.35 +/- 335.62 | time_elapsed | 591 | Eval num_timesteps=399985, episode_reward=524.58 +/- 313.33 | time_elapsed | 677 |
Hyperparameters:
HalfCheetahBulletEnv-v0: env_wrapper: utils.wrappers.TimeFeatureWrapper normalize: true n_envs: 16 n_timesteps: !!float 2e6 policy: 'MlpPolicy' batch_size: 128 n_steps: 512 gamma: 0.99 gae_lambda: 0.9 n_epochs: 20 ent_coef: 0.0 sde_sample_freq: 4 max_grad_norm: 0.5 vf_coef: 0.5 learning_rate: !!float 3e-5 use_sde: True clip_range: 0.4 policy_kwargs: "dict(log_std_init=-2, ortho_init=False, activation_fn=nn.ReLU, net_arch=[dict(pi=[256, 256], vf=[256, 256])] )"
The text was updated successfully, but these errors were encountered:
Merge pull request DLR-RM#52 from Antonin-Raffin/refactor/predict
f8e3995
Refactor predict method
araffin
Successfully merging a pull request may close this issue.
The issue may come from this commit (distribution refactoring): fdecd51
The difference between working and not working code: ba18258...fdecd51
Currently inspecting the commit but help is welcomed ;)
Related to #49 #48
v0.3.0 is working, v0.4.0 has the bug.
Perf PPO on HalfCheetah using the rl zoo:
python train.py --algo ppo --env HalfCheetahBulletEnv-v0 --eval-freq 50000 --seed 2682960776
Pybullet: 2.6.5 (should work with 2.7.1 too)
Gym: 0.17.1
PyTorch: 1.5.0
Seed: 2682960776 - cpu
SB3 24 february (working version) - gSDE Paper version
SB3: f1a4fa2
RL zoo: abf8fcd
Eval num_timesteps=49985, episode_reward=-1162.74 +/- 39.43
Eval num_timesteps=99985, episode_reward=-1206.28 +/- 48.66
Eval num_timesteps=149985, episode_reward=-1167.46 +/- 29.00
Eval num_timesteps=199985, episode_reward=871.04 +/- 8.67
Eval num_timesteps=249985, episode_reward=552.29 +/- 918.83
Eval num_timesteps=299985, episode_reward=1302.70 +/- 29.44
Eval num_timesteps=349985, episode_reward=1459.13 +/- 96.28
Eval num_timesteps=399985, episode_reward=1225.08 +/- 593.00
Eval num_timesteps=449985, episode_reward=1966.47 +/- 55.37
| time_elapsed | 745 |
v0.3.0 (working)
Eval num_timesteps=49985, episode_reward=-1251.36 +/- 73.55
Eval num_timesteps=99985, episode_reward=-1335.98 +/- 8.61
Eval num_timesteps=149985, episode_reward=722.82 +/- 32.92
Eval num_timesteps=199985, episode_reward=789.23 +/- 41.66
Eval num_timesteps=249985, episode_reward=884.63 +/- 12.12
Eval num_timesteps=299985, episode_reward=1128.64 +/- 27.48
Eval num_timesteps=349985, episode_reward=1326.70 +/- 80.14
Eval num_timesteps=399985, episode_reward=1528.11 +/- 52.68
| time_elapsed | 662 |
Eval num_timesteps=449985, episode_reward=1626.98 +/- 75.79
23 March - Remove CEMRL (working)
SB3: 4b2092f
RL zoo: 4c685669169b212a
Eval num_timesteps=49985, episode_reward=-1114.12 +/- 278.50
Eval num_timesteps=99985, episode_reward=-1159.72 +/- 28.42
Eval num_timesteps=149985, episode_reward=-1076.83 +/- 194.04
Eval num_timesteps=199985, episode_reward=-395.34 +/- 711.05
Eval num_timesteps=249985, episode_reward=-46.66 +/- 384.98
Eval num_timesteps=299985, episode_reward=993.80 +/- 403.25
Eval num_timesteps=349985, episode_reward=1534.85 +/- 18.85
### 23 March - Change pre-processing (working)
SB3: ba18258
RL zoo: 8b71eddc7561b26
Eval num_timesteps=49985, episode_reward=-1294.17 +/- 71.03
Eval num_timesteps=99985, episode_reward=-1047.79 +/- 107.26
Eval num_timesteps=149985, episode_reward=-509.42 +/- 736.42
Eval num_timesteps=199985, episode_reward=491.34 +/- 37.18
Eval num_timesteps=249985, episode_reward=929.41 +/- 55.70
Eval num_timesteps=299985, episode_reward=922.89 +/- 52.07
Eval num_timesteps=349985, episode_reward=1161.23 +/- 71.37
31 March - Refactor Action Distribution v0.4.0a3 (not working)
SB3: fdecd51
RL zoo: 8b71eddc7561b26
Eval num_timesteps=49985, episode_reward=-1254.91 +/- 96.88
Eval num_timesteps=99985, episode_reward=-1139.13 +/- 175.29
Eval num_timesteps=149985, episode_reward=-608.69 +/- 658.00
Eval num_timesteps=199985, episode_reward=334.35 +/- 363.71
Eval num_timesteps=249985, episode_reward=-283.72 +/- 485.11
Eval num_timesteps=299985, episode_reward=-44.18 +/- 84.45
Eval num_timesteps=349985, episode_reward=192.63 +/- 19.71
Eval num_timesteps=399985, episode_reward=292.71 +/- 177.96
| time_elapsed | 683 |
v0.4.0 (not working)
Eval num_timesteps=49985, episode_reward=-1335.59 +/- 38.26
Eval num_timesteps=99985, episode_reward=-717.95 +/- 415.52
Eval num_timesteps=149985, episode_reward=-555.61 +/- 99.95
Eval num_timesteps=199985, episode_reward=-1091.25 +/- 37.42
Eval num_timesteps=249985, episode_reward=-741.49 +/- 92.98
Eval num_timesteps=299985, episode_reward=-139.83 +/- 60.00
Eval num_timesteps=349985, episode_reward=10.24 +/- 306.05
Eval num_timesteps=399985, episode_reward=554.69 +/- 13.50
Eval num_timesteps=449985, episode_reward=634.15 +/- 12.41
Eval num_timesteps=499985, episode_reward=721.88 +/- 13.24
| time_elapsed | 696 |
v0.5.0 (not working)
Eval num_timesteps=49985, episode_reward=-1234.57 +/- 76.73
Eval num_timesteps=99985, episode_reward=-1102.34 +/- 103.49
Eval num_timesteps=149985, episode_reward=-948.12 +/- 138.89
Eval num_timesteps=199985, episode_reward=483.17 +/- 90.65
Eval num_timesteps=249985, episode_reward=609.21 +/- 14.94
Eval num_timesteps=299985, episode_reward=651.08 +/- 24.13
Eval num_timesteps=349985, episode_reward=497.35 +/- 335.62
| time_elapsed | 591 |
Eval num_timesteps=399985, episode_reward=524.58 +/- 313.33
| time_elapsed | 677 |
Hyperparameters:
The text was updated successfully, but these errors were encountered: