Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

social nav readme #1699

Merged
merged 5 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 140 additions & 4 deletions habitat-baselines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,147 @@ First download the necessary data with `python -m habitat_sim.utils.datasets_dow

## Social Navigation

To run multi-agent training with a Spot robot's policy being a low-level navigation policy and a humanoid's policy being a fixed (non-trainable) policy that navigates a sequence of navigation targets.
- `python habitat_baselines/run.py --config-name=social_nav/social_nav.yaml`
In the social navigation task, a robot is tasked with finding and following a human. The goal is to train a neural network policy that takes the input of (1) Spot's arm depth image, (2) the humanoid detector sensor, and (3) Spot's depth stereo cameras, and outputs the linear and angular velocities.

### Observation
The observation of the social nav policy is defined under `habitat.gym.obs_keys` with the prefix of `agent_0` in `habitat-lab/habitat/config/benchmark/multi_agent/hssd_spot_human_social_nav.yaml`. In this yaml, `agent_0_articulated_agent_arm_depth` is the robot's arm depth camera, and `agent_0_humanoid_detector_sensor` is a humanoid detector that returns either a human's segmentation or bounding box given an arm RGB camera. For `humanoid_detector_sensor`, please see `HumanoidDetectorSensorConfig` in `habitat-lab/habitat/config/default_structured_configs.py` to learn more about how to configure the sensor (e.g., do you want the return to be bounding box or segmentation). Finally, `agent_0_spot_head_stereo_depth_sensor` is a Spot's body stereo depth image.

Note that if you want to add more or use other observation sensors, you can do that by adding sensors into `habitat.gym.obs_keys`. For example, you can provide a humanoid GPS to a policy's input by adding `agent_0_nav_goal_sensor` into `habitat.gym.obs_keys` in `hssd_spot_human_social_nav.yaml`. Notice that the observation key in `habitat.gym.obs_keys` must be a subset of sensors in `/habitat/task/lab_sensors`. Finally, another example would be adding an arm RGB sensor. You can do that by adding `agent_0_articulated_agent_arm_rgb` into `habitat.gym.obs_keys` in `hssd_spot_human_social_nav.yaml`.

### Action
The action space of the social nav policy is defined under `/habitat/task/[email protected]_0_base_velocity: base_velocity_non_cylinder` in `habitat-lab/habitat/config/benchmark/multi_agent/hssd_spot_human_social_nav.yaml`. The action consists of linear and angular velocities. You can learn more about the hyperparameters for this action under `BaseVelocityNonCylinderActionConfig` in `habitat-lab/habitat/config/default_structured_configs.py`.

### Reward
The reward function of the social nav policy is defined in `social_nav_reward`. It encourages the robot to find the human as soon as possible while maintaining a safe distance from the human after finding a human. You can learn more about the hyperparameters for this reward function under `SocialNavReward` in `habitat-lab/habitat/config/default_structured_configs.py`.

### Command
We provide a checkpoint. To reproduce this, run multi-agent training with a Spot robot's policy being a low-level navigation policy and a humanoid's policy being a fixed (non-trainable) policy that navigates a sequence of navigation targets (please make sure the `tensorboard_dir`, `video_dir`, `checkpoint_folder`, `eval_ckpt_path_dir` are the paths you want):

```bash
python -u -m habitat_baselines.run \
--config-name=social_nav/social_nav.yaml \
benchmark/multi_agent=hssd_spot_human_social_nav \
habitat_baselines.evaluate=False \
habitat_baselines.num_checkpoints=5000 \
habitat_baselines.total_num_steps=1.0e9 \
habitat_baselines.num_environments=24 \
habitat_baselines.tensorboard_dir=tb_social_nav \
habitat_baselines.video_dir=video_social_nav \
habitat_baselines.checkpoint_folder=checkpoints_social_nav \
habitat_baselines.eval_ckpt_path_dir=checkpoints_social_nav \
habitat.task.actions.agent_0_base_velocity.longitudinal_lin_speed=10.0 \
habitat.task.actions.agent_0_base_velocity.ang_speed=10.0 \
habitat.task.actions.agent_0_base_velocity.allow_dyn_slide=True \
habitat.task.actions.agent_0_base_velocity.enable_rotation_check_for_dyn_slide=False \
habitat.task.actions.agent_1_oracle_nav_randcoord_action.lin_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_randcoord_action.ang_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_action.lin_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_action.ang_speed=10.0 \
habitat.task.measurements.social_nav_reward.facing_human_reward=3.0 \
habitat.task.measurements.social_nav_reward.count_coll_pen=0.01 \
habitat.task.measurements.social_nav_reward.max_count_colls=-1 \
habitat.task.measurements.social_nav_reward.count_coll_end_pen=5 \
habitat.task.measurements.social_nav_reward.use_geo_distance=True \
habitat.task.measurements.social_nav_reward.facing_human_dis=3.0 \
habitat.task.measurements.social_nav_seek_success.following_step_succ_threshold=400 \
habitat.task.measurements.social_nav_seek_success.need_to_face_human=True \
habitat.task.measurements.social_nav_seek_success.use_geo_distance=True \
habitat.task.measurements.social_nav_seek_success.facing_threshold=0.5 \
habitat.task.lab_sensors.humanoid_detector_sensor.return_image=True \
habitat.task.lab_sensors.humanoid_detector_sensor.is_return_image_bbox=True \
habitat.task.success_reward=10.0 \
habitat.task.end_on_success=True \
habitat.task.slack_reward=-0.1 \
habitat.environment.max_episode_steps=1500 \
habitat.simulator.kinematic_mode=True \
habitat.simulator.ac_freq_ratio=4 \
habitat.simulator.ctrl_freq=120 \
habitat.simulator.agents.agent_0.joint_start_noise=0.0
```

It is expected to observe the following reward training (learning) curve:
![Social Nav Reward Trianing Curve](/res/img/habitat3_social_nav_training_reward.png) In addition, under the following slurm job batch script setting:
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved
```bash
#SBATCH --gres gpu:4
#SBATCH --cpus-per-task 10
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 4
#SBATCH --mem-per-cpu=6GB
```
we have the following training wall clock time versus reward:
![Social Nav Reward Trianing Curve versus Time](/res/img/habitat3_social_nav_training_reward_time.png)
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved

We have the following training FPS:
![Social Nav Trianing FPS](/res/img/habitat3_social_nav_training_fps.png)
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved

For evaluating the trained Spot robot's policy based on 500 episodes, run (please make sure `video_dir` and `eval_ckpt_path_dir` are the paths you want and the checkpoint is there):

```bash
python -u -m habitat_baselines.run \
--config-name=social_nav/social_nav.yaml \
benchmark/multi_agent=hssd_spot_human_social_nav \
habitat_baselines.evaluate=True \
habitat_baselines.num_checkpoints=5000 \
habitat_baselines.total_num_steps=1.0e9 \
habitat_baselines.num_environments=12 \
habitat_baselines.video_dir=video_social_nav \
habitat_baselines.checkpoint_folder=checkpoints_social_nav \
habitat_baselines.eval_ckpt_path_dir=checkpoints_social_nav/latest.pth \
habitat.task.actions.agent_0_base_velocity.longitudinal_lin_speed=10.0 \
habitat.task.actions.agent_0_base_velocity.ang_speed=10.0 \
habitat.task.actions.agent_0_base_velocity.allow_dyn_slide=True \
habitat.task.actions.agent_0_base_velocity.enable_rotation_check_for_dyn_slide=False \
habitat.task.actions.agent_1_oracle_nav_randcoord_action.human_stop_and_walk_to_robot_distance_threshold=-1.0 \
habitat.task.actions.agent_1_oracle_nav_randcoord_action.lin_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_randcoord_action.ang_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_action.lin_speed=10.0 \
habitat.task.actions.agent_1_oracle_nav_action.ang_speed=10.0 \
habitat.task.measurements.social_nav_reward.facing_human_reward=3.0 \
habitat.task.measurements.social_nav_reward.count_coll_pen=0.01 \
habitat.task.measurements.social_nav_reward.max_count_colls=-1 \
habitat.task.measurements.social_nav_reward.count_coll_end_pen=5 \
habitat.task.measurements.social_nav_reward.use_geo_distance=True \
habitat.task.measurements.social_nav_reward.facing_human_dis=3.0 \
habitat.task.measurements.social_nav_seek_success.following_step_succ_threshold=400 \
habitat.task.measurements.social_nav_seek_success.need_to_face_human=True \
habitat.task.measurements.social_nav_seek_success.use_geo_distance=True \
habitat.task.measurements.social_nav_seek_success.facing_threshold=0.5 \
habitat.task.lab_sensors.humanoid_detector_sensor.return_image=True \
habitat.task.lab_sensors.humanoid_detector_sensor.is_return_image_bbox=True \
habitat.task.success_reward=10.0 \
habitat.task.end_on_success=False \
habitat.task.slack_reward=-0.1 \
habitat.environment.max_episode_steps=1500 \
habitat.simulator.kinematic_mode=True \
habitat.simulator.ac_freq_ratio=4 \
habitat.simulator.ctrl_freq=120 \
habitat.simulator.agents.agent_0.joint_start_noise=0.0 \
habitat_baselines.load_resume_state_config=False \
habitat_baselines.test_episode_count=500 \
habitat_baselines.eval.extra_sim_sensors.third_rgb_sensor.height=1080 \
habitat_baselines.eval.extra_sim_sensors.third_rgb_sensor.width=1920
```

It is expected to get the following numbers or something similar after running the evaluation:
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved

```bash
Average episode social_nav_reward: 1.8821
Average episode social_nav_stats.has_found_human: 0.9020
Average episode social_nav_stats.found_human_rate_after_encounter_over_epi: 0.6423
Average episode social_nav_stats.found_human_rate_over_epi: 0.4275
Average episode social_nav_stats.frist_ecnounter_steps: 376.0420
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved
Average episode social_nav_stats.follow_human_steps_after_frist_encounter: 398.6340
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved
Average episode social_nav_stats.avg_robot_to_human_after_encounter_dis_over_epi: 1.4969
Average episode social_nav_stats.avg_robot_to_human_dis_over_epi: 3.6885
Average episode social_nav_stats.backup_ratio: 0.1889
Average episode social_nav_stats.yield_ratio: 0.0192
Average episode num_agents_collide: 0.7020
```

### Checkpoint

We release a [checkpoint](https://arxiv.org/abs/2106.14405) based on the above command.
jimmytyyang marked this conversation as resolved.
Show resolved Hide resolved

For evaluating the trained Spot robot's policy
- `python habitat_baselines/run.py --config-name=social_nav/social_nav.yaml habitat_baselines.evaluate=True habitat_baselines.eval_ckpt_path_dir=/checkpoints/latest.pth habitat_baselines.eval.should_load_ckpt=True`

## Social Rearrangement

Expand Down
Binary file added res/img/habitat3_social_nav_training_fps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added res/img/habitat3_social_nav_training_reward.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.