Dm control humanoid ppo learnable stand, walk, run #484

Xander-Hinrichsen · 2024-08-09T04:15:51Z

Stand

8.mp4

(no graph for stand due to change in ppo.py that I pulled in - logger defaults to None and doesn't log episodic return anymore - other two tasks and rgb only were run before I pulled this change - though, stand converged every time for me)

command for running (above is seed 1 result):

for i in {1..3}; do python ppo.py --exp_name="__final_standseed${i}" --env_id="MS-HumanoidStand-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=40_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.95 --seed=${i}; done

Walk

video too large: https://drive.google.com/file/d/1ssTGoGHvvOf6e2RTgvIi6Je3xWV_VpwY/view?usp=sharing

command (video above is seed 1)

for i in {1..3}; do python ppo.py --exp_name="__final_walkseed${i}" --env_id="MS-HumanoidWalk-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 --seed=${i} --ent_coef=1e-3; done

Run

video too large: https://drive.google.com/file/d/1hkNfUcv04hPVnuyWNSvp9swrNjhnwDBf/view?usp=sharing

command (video above is seed 1)

for i in {1..3}; do python ppo.py --exp_name="__final_runseed${i}" --env_id="MS-HumanoidRun-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 --seed=${i} --ent_coef=1e-3; done

Run RGB Only

video too large: https://drive.google.com/file/d/1QZ1R7OrJLc8YlOY28tpHhgRj-FCejKzr/view?usp=sharing

command:
python ppo_rgb.py --exp_name="__human_rgb_run2" --env_id="MS-HumanoidRun-v1" --num_envs=256 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=15 --num_eval_steps=1000 --num_steps=200 --gamma=0.98 --seed=1 --no-include-state --render_mode="rgb_array" --ent_coef=1e-3

in addition, i have written and slightly tested out the hard versions of these environments (the code exists and is commented out in control/humanoid.py), but they seem only learnable via sac, and take very long to learn and confirm and potentially debug while adding little additional value

…control-humanoid

…trol-humanoid

…hard envs), base for easier versions called 'standard'

…chsen/ManiSkill into dm-control-humanoid

StoneT2000 · 2024-08-09T04:20:25Z

don't worry if ppo can't solve harder versions. No one has solved them. Typically only SAC or other off policy methods have worked

Xander-Hinrichsen · 2024-08-09T04:20:59Z

also, I can easily add a white wall in the background, change the camera to be more eagle eyed view so only ground is in view, etc. -want to double check what you would like before i would make a nice render for the docs

Xander-Hinrichsen · 2024-08-09T04:23:31Z

don't worry if ppo can't solve harder versions. No one has solved them. Typically only SAC or other off policy methods have worked

okay makes sense, the only difference between the standard and hard versions are that standard gets more observations (link velocities), and way less randomization

StoneT2000

might also want to merge main and check if it works. There might have been a breaking change with some config namings (we are trying to rename everything from cfg to config for consistency)

mani_skill/utils/building/_mjcf_loader.py

StoneT2000 · 2024-08-15T22:03:58Z

Oh also for the example ppo commands, I have organized a bit more now so they should go to the baselines/ppo/examples.sh file now. There is also a baselines.sh file which will be for official baseline results we upload to wandb but I am not sure whether these locomotion tasks should be part of the RL benchmark since they don't have notions of success (hard to take a e.g. averaged success rate graph of all tasks to compare rl algorithms). At best maybe some normalized score function but I never liked how people used that since that's basically not interpretable.

examples/baselines/ppo/ppo_rgb.py

StoneT2000 · 2024-08-20T23:33:16Z

one tiny change then I can merge.

Xander-Hinrichsen and others added 15 commits July 29, 2024 20:00

simple stand, camera + qpos_rand todo

5f8eef1

Merge branch 'haosulab:main' into dm-control-humanoid

c7b6ca9

sac works, hard humanoid stand

c28d26f

sac workd, stand hard version

434f82c

Merge branch 'main' of https://github.com/haosulab/ManiSkill into dm-…

756154d

…control-humanoid

Merge remote-tracking branch 'origin/dm-control-humanoid' into dm-con…

ad715fe

…trol-humanoid

refactored humanoid env, added correct foot friction (req. re-run of …

81b476c

…hard envs), base for easier versions called 'standard'

refactored humanoid env, added correct foot friction (req. re-run of …

c322173

…hard envs), base for easier versions called 'standard'

it can run, controller tuned

1579a32

optimized humanoid controller config

48e5c61

Merge branch 'haosulab:main' into dm-control-humanoid

8b3ea68

reformatting

bee876b

Merge branch 'dm-control-humanoid' of https://github.com/Xander-Hinri…

18c25d0

…chsen/ManiSkill into dm-control-humanoid

cleaned up comments

cb263e5

standing now works

23188af

Xander-Hinrichsen requested a review from StoneT2000 August 9, 2024 04:18

StoneT2000 requested changes Aug 15, 2024

View reviewed changes

mani_skill/utils/building/_mjcf_loader.py Outdated Show resolved Hide resolved

Xander-Hinrichsen added 3 commits August 19, 2024 18:34

upstream merge

62fb89c

merge compatibility and typo fix

279b0f3

ppo args added to examples.sh

0556ee3

Xander-Hinrichsen requested a review from StoneT2000 August 20, 2024 02:39

StoneT2000 requested changes Aug 20, 2024

View reviewed changes

examples/baselines/ppo/ppo_rgb.py Outdated Show resolved Hide resolved

merge issue fix

0e6e5f6

Xander-Hinrichsen requested a review from StoneT2000 August 20, 2024 23:49

StoneT2000 merged commit 20caecc into haosulab:main Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dm control humanoid ppo learnable stand, walk, run #484

Dm control humanoid ppo learnable stand, walk, run #484

Xander-Hinrichsen commented Aug 9, 2024 •

edited

Loading

StoneT2000 commented Aug 9, 2024

Xander-Hinrichsen commented Aug 9, 2024

Xander-Hinrichsen commented Aug 9, 2024

StoneT2000 left a comment

StoneT2000 commented Aug 15, 2024

StoneT2000 commented Aug 20, 2024

Dm control humanoid ppo learnable stand, walk, run #484

Dm control humanoid ppo learnable stand, walk, run #484

Conversation

Xander-Hinrichsen commented Aug 9, 2024 • edited Loading

Stand

Walk

Run

Run RGB Only

StoneT2000 commented Aug 9, 2024

Xander-Hinrichsen commented Aug 9, 2024

Xander-Hinrichsen commented Aug 9, 2024

StoneT2000 left a comment

Choose a reason for hiding this comment

StoneT2000 commented Aug 15, 2024

StoneT2000 commented Aug 20, 2024

Xander-Hinrichsen commented Aug 9, 2024 •

edited

Loading