Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dm control humanoid ppo learnable stand, walk, run #484

Merged
merged 19 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
5f8eef1
simple stand, camera + qpos_rand todo
Xander-Hinrichsen Jul 30, 2024
c7b6ca9
Merge branch 'haosulab:main' into dm-control-humanoid
Xander-Hinrichsen Jul 30, 2024
c28d26f
sac works, hard humanoid stand
Xander-Hinrichsen Jul 31, 2024
434f82c
sac workd, stand hard version
Xander-Hinrichsen Jul 31, 2024
756154d
Merge branch 'main' of https://github.com/haosulab/ManiSkill into dm-…
Xander-Hinrichsen Aug 1, 2024
ad715fe
Merge remote-tracking branch 'origin/dm-control-humanoid' into dm-con…
Xander-Hinrichsen Aug 1, 2024
81b476c
refactored humanoid env, added correct foot friction (req. re-run of …
Xander-Hinrichsen Aug 3, 2024
c322173
refactored humanoid env, added correct foot friction (req. re-run of …
Xander-Hinrichsen Aug 3, 2024
1579a32
it can run, controller tuned
Xander-Hinrichsen Aug 5, 2024
48e5c61
optimized humanoid controller config
Xander-Hinrichsen Aug 6, 2024
8b3ea68
Merge branch 'haosulab:main' into dm-control-humanoid
Xander-Hinrichsen Aug 8, 2024
bee876b
reformatting
Xander-Hinrichsen Aug 8, 2024
18c25d0
Merge branch 'dm-control-humanoid' of https://github.com/Xander-Hinri…
Xander-Hinrichsen Aug 8, 2024
cb263e5
cleaned up comments
Xander-Hinrichsen Aug 8, 2024
23188af
standing now works
Xander-Hinrichsen Aug 9, 2024
62fb89c
upstream merge
Xander-Hinrichsen Aug 20, 2024
279b0f3
merge compatibility and typo fix
Xander-Hinrichsen Aug 20, 2024
0556ee3
ppo args added to examples.sh
Xander-Hinrichsen Aug 20, 2024
0e6e5f6
merge issue fix
Xander-Hinrichsen Aug 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 62 additions & 1 deletion docs/source/tasks/control/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,65 @@ Hopper robot stands upright

**Success Conditions:**
- No specific success conditions. We can threshold the episode accumulated reward to determine success.
:::
:::

## MS-HumanoidStand-v1
![dense-reward][reward-badge]

:::{dropdown} Task Card
:icon: note
:color: primary

**Task Description:**
Humanoid robot stands upright


**Supported Robots: humanoid**

**Randomizations:**
- Humanoid robot is randomly rotated [-pi, pi] radians about z axis.
- Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

**Fail Conditions:**
- Humanoid robot torso link leaves z range [0.7, 1.0]
:::

## MS-HumanoidWalk-v1
![dense-reward][reward-badge]

:::{dropdown} Task Card
:icon: note
:color: primary

**Task Description:**
Humanoid moves in x direction at walking pace


**Supported Robots: humanoid**

**Randomizations:**
- Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

**Fail Conditions:**
- Humanoid robot torso link leaves z range [0.7, 1.0]
:::

## MS-HumanoidRun-v1
![dense-reward][reward-badge]

:::{dropdown} Task Card
:icon: note
:color: primary

**Task Description:**
Humanoid moves in x direction at running pace


**Supported Robots: humanoid**

**Randomizations:**
- Humanoid qpos and qvel have added noise from uniform distribution [-1e-2, 1e-2]

**Fail Conditions:**
- Humanoid robot torso link leaves z range [0.7, 1.0]
:::
16 changes: 16 additions & 0 deletions examples/baselines/ppo/examples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,17 @@ python ppo.py --env_id="MS-CartpoleSwingUp-v1" \
--total_timesteps=10_000_000 --num-steps=250 --num-eval-steps=1000 \
--gamma=0.99 --gae_lambda=0.95 \
--eval_freq=5
python ppo.py --env_id="MS-HumanoidStand-v1" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=40_000_000 \
--eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.95
python ppo.py --env_id="MS-HumanoidWalk-v1" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=80_000_000 \
--eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 \
--ent_coef=1e-3
python ppo.py --env_id="MS-HumanoidRun-v1" --num_envs=2048 \
--update_epochs=8 --num_minibatches=32 --total_timesteps=60_000_000 \
--eval_freq=10 --num_eval_steps=1000 --num_steps=200 --gamma=0.97 \
--ent_coef=1e-3
python ppo.py --env_id="UnitreeG1PlaceAppleInBowl-v1" \
--num_envs=512 --update_epochs=8 --num_minibatches=32 \
--total_timesteps=50_000_000 --num-steps=100 --num-eval-steps=100
Expand Down Expand Up @@ -98,3 +109,8 @@ python ppo_rgb.py --env_id="PickSingleYCB-v1" \
python ppo_rgb.py --env_id="PushT-v1" \
--num_envs=256 --update_epochs=8 --num_minibatches=8 \
--total_timesteps=25_000_000 --num-steps=100 --num_eval_steps=100 --gamma=0.99
python ppo_rgb.py --env_id="MS-HumanoidRun-v1" \
--num_envs=256 --update_epochs=8 --num_minibatches=32 \
--total_timesteps=80_000_000 --eval_freq=15 --num_eval_steps=1000 \
--num_steps=200 --gamma=0.98 --no-include-state --render_mode="rgb_array" \
--ent_coef=1e-3
4 changes: 3 additions & 1 deletion examples/baselines/ppo/ppo_rgb.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ class Args:
"""if toggled, only runs evaluation with the given model checkpoint and saves the evaluation trajectories"""
checkpoint: str = None
"""path to a pretrained checkpoint file to start evaluation/training from"""
render_mode: str = "all"
"""the environment rendering mode"""

# Algorithm specific arguments
env_id: str = "PickCube-v1"
Expand Down Expand Up @@ -288,7 +290,7 @@ def close(self):
device = torch.device("cuda" if torch.cuda.is_available() and args.cuda else "cpu")

# env setup
env_kwargs = dict(obs_mode="rgb", control_mode="pd_joint_delta_pos", render_mode="all", sim_backend="gpu")
env_kwargs = dict(obs_mode="rgbd", control_mode="pd_joint_delta_pos", render_mode=args.render_mode, sim_backend="gpu")
Xander-Hinrichsen marked this conversation as resolved.
Show resolved Hide resolved
eval_envs = gym.make(args.env_id, num_envs=args.num_eval_envs, **env_kwargs)
envs = gym.make(args.env_id, num_envs=args.num_envs if not args.evaluate else 1, **env_kwargs)

Expand Down
44 changes: 38 additions & 6 deletions mani_skill/agents/robots/humanoid/humanoid.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,47 @@ def _controller_configs(self):
damping=10,
normalize_action=False,
)

# for pd_joint_delta_pos control
joints_dict = {
"abdomen_y": {"damping": 5, "stiffness": 40},
"abdomen_z": {"damping": 5, "stiffness": 40},
"abdomen_x": {"damping": 5, "stiffness": 40},
"right_hip_x": {"damping": 5, "stiffness": 40},
"right_hip_z": {"damping": 5, "stiffness": 40},
"right_hip_y": {"damping": 5, "stiffness": 120},
"right_knee": {"damping": 1, "stiffness": 80},
"right_ankle_x": {"damping": 3, "stiffness": 20},
"right_ankle_y": {"damping": 3, "stiffness": 40},
"left_hip_x": {"damping": 5, "stiffness": 40},
"left_hip_z": {"damping": 5, "stiffness": 40},
"left_hip_y": {"damping": 5, "stiffness": 120},
"left_knee": {"damping": 1, "stiffness": 80},
"left_ankle_x": {"damping": 3, "stiffness": 20},
"left_ankle_y": {"damping": 3, "stiffness": 40},
"right_shoulder1": {"damping": 1, "stiffness": 20},
"right_shoulder2": {"damping": 1, "stiffness": 20},
"right_elbow": {"damping": 0, "stiffness": 40},
"left_shoulder1": {"damping": 1, "stiffness": 20},
"left_shoulder2": {"damping": 1, "stiffness": 20},
"left_elbow": {"damping": 0, "stiffness": 40},
}

joint_names = list(joints_dict.keys())
assert sorted(joint_names) == sorted([x.name for x in self.robot.active_joints])

damping = np.array([joint["damping"] for joint in joints_dict.values()])
stiffness = np.array([joint["stiffness"] for joint in joints_dict.values()])

pd_joint_delta_pos = PDJointPosControllerConfig(
[j.name for j in self.robot.active_joints],
-1,
1,
damping=5,
stiffness=20,
force_limit=100,
joint_names,
-2,
2,
damping=damping,
stiffness=stiffness,
use_delta=True,
)

return deepcopy_dict(
dict(
pd_joint_pos=dict(body=pd_joint_pos, balance_passive_force=False),
Expand Down
3 changes: 2 additions & 1 deletion mani_skill/assets/robots/humanoid/humanoid.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@

<worldbody>
<geom name="floor" type="plane" conaffinity="1" size="100 100 .2" material="grid"/>
<body name="torso" pos="0 0 1.5" childclass="body">
<!-- body pos changed from pos="0 0 1.5" for compatability with maniskill articulation set_root_pose -->
<body name="torso" pos="0 0 0" childclass="body">
<light name="top" pos="0 0 2" mode="trackcom"/>
<camera name="back" pos="-3 0 1" xyaxes="0 -1 0 1 0 2" mode="trackcom"/>
<camera name="side" pos="0 -3 1" xyaxes="1 0 0 0 1 2" mode="trackcom"/>
Expand Down
1 change: 1 addition & 0 deletions mani_skill/envs/tasks/control/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .cartpole import CartpoleBalanceEnv, CartpoleSwingUpEnv
from .hopper import HopperHopEnv, HopperStandEnv
from .humanoid import HumanoidRun, HumanoidStand, HumanoidWalk
Loading