Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dm control humanoid ppo learnable stand, walk, run #484

Merged
merged 19 commits into from
Aug 20, 2024

Conversation

Xander-Hinrichsen
Copy link
Collaborator

@Xander-Hinrichsen Xander-Hinrichsen commented Aug 9, 2024

Stand

8.mp4

(no graph for stand due to change in ppo.py that I pulled in - logger defaults to None and doesn't log episodic return anymore - other two tasks and rgb only were run before I pulled this change - though, stand converged every time for me)

command for running (above is seed 1 result):

for i in {1..3}; do python ppo.py --exp_name="__final_standseed${i}" --env_id="MS-HumanoidStand-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=40_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.95 --seed=${i}; done

Walk

video too large: https://drive.google.com/file/d/1ssTGoGHvvOf6e2RTgvIi6Je3xWV_VpwY/view?usp=sharing

image

command (video above is seed 1)

for i in {1..3}; do python ppo.py --exp_name="__final_walkseed${i}" --env_id="MS-HumanoidWalk-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 --seed=${i} --ent_coef=1e-3; done

Run

video too large: https://drive.google.com/file/d/1hkNfUcv04hPVnuyWNSvp9swrNjhnwDBf/view?usp=sharing

image

command (video above is seed 1)

for i in {1..3}; do python ppo.py --exp_name="__final_runseed${i}" --env_id="MS-HumanoidRun-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 --seed=${i} --ent_coef=1e-3; done

Run RGB Only

video too large: https://drive.google.com/file/d/1QZ1R7OrJLc8YlOY28tpHhgRj-FCejKzr/view?usp=sharing

image

command:
python ppo_rgb.py --exp_name="__human_rgb_run2" --env_id="MS-HumanoidRun-v1" --num_envs=256 --update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 --eval_freq=15 --num_eval_steps=1000 --num_steps=200 --gamma=0.98 --seed=1 --no-include-state --render_mode="rgb_array" --ent_coef=1e-3

in addition, i have written and slightly tested out the hard versions of these environments (the code exists and is commented out in control/humanoid.py), but they seem only learnable via sac, and take very long to learn and confirm and potentially debug while adding little additional value

@StoneT2000
Copy link
Member

don't worry if ppo can't solve harder versions. No one has solved them. Typically only SAC or other off policy methods have worked

@Xander-Hinrichsen
Copy link
Collaborator Author

also, I can easily add a white wall in the background, change the camera to be more eagle eyed view so only ground is in view, etc. -want to double check what you would like before i would make a nice render for the docs

@Xander-Hinrichsen
Copy link
Collaborator Author

don't worry if ppo can't solve harder versions. No one has solved them. Typically only SAC or other off policy methods have worked

okay makes sense, the only difference between the standard and hard versions are that standard gets more observations (link velocities), and way less randomization

Copy link
Member

@StoneT2000 StoneT2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also want to merge main and check if it works. There might have been a breaking change with some config namings (we are trying to rename everything from cfg to config for consistency)

mani_skill/utils/building/_mjcf_loader.py Outdated Show resolved Hide resolved
@StoneT2000
Copy link
Member

Oh also for the example ppo commands, I have organized a bit more now so they should go to the baselines/ppo/examples.sh file now. There is also a baselines.sh file which will be for official baseline results we upload to wandb but I am not sure whether these locomotion tasks should be part of the RL benchmark since they don't have notions of success (hard to take a e.g. averaged success rate graph of all tasks to compare rl algorithms). At best maybe some normalized score function but I never liked how people used that since that's basically not interpretable.

examples/baselines/ppo/ppo_rgb.py Outdated Show resolved Hide resolved
@StoneT2000
Copy link
Member

one tiny change then I can merge.

@StoneT2000 StoneT2000 merged commit 20caecc into haosulab:main Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants