-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Port HIL SERL #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
aliberts
merged 312 commits into
main
from
user/adil-zouitine/2025-1-7-port-hil-serl-new
Jun 13, 2025
Merged
Port HIL SERL #644
Changes from 250 commits
Commits
Show all changes
312 commits
Select commit
Hold shift + click to select a range
a90f487
Add maniskill support.
AdilZouitine c85f88f
Improve wandb logging and custom step tracking in logger
AdilZouitine 62e237b
Re-enable parameter push thread in learner server
AdilZouitine 0d88a5e
- Fixed big issue in the loading of the policy parameters sent by the…
michel-aractingi 85242ca
Refactor SAC policy with performance optimizations and multi-camera s…
AdilZouitine e1d55c7
[Port HIL-SERL] Adjust Actor-Learner architecture & clean up dependen…
helper2424 d3b84ec
Added caching function in the learner_server and modeling sac in orde…
michel-aractingi 4c73891
Update ManiSkill configuration and replay buffer to support truncatio…
AdilZouitine 1d4ec50
Refactor ReplayBuffer with tensor-based storage and improved sampling…
AdilZouitine 9ea79f8
Add storage device parameter to replay buffer initialization
AdilZouitine ae51c19
Add memory optimization option to ReplayBuffer
AdilZouitine bb69cb3
Add storage device configuration for SAC policy and replay buffer
AdilZouitine 85fe8a3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b6a2200
[HIL-SERL] Migrate threading to multiprocessing (#759)
helper2424 3dfb37e
[Port HIL-SERL] Balanced sampler function speed up and refactor to al…
s1lent4gnt e002c5e
Remove torch.no_grad decorator and optimize next action prediction in…
AdilZouitine 2f04d0d
Add custom save and load methods for SAC policy
AdilZouitine 5993265
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 66816fd
Enhance SAC configuration and policy with gradient clipping and tempe…
AdilZouitine 7b01e16
Add end effector action space to hil-serl (#861)
michel-aractingi 0959694
Refactor SACPolicy and learner server for improved replay buffer mana…
AdilZouitine fd74c19
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 03fe0f0
Update configuration files for improved performance and flexibility
AdilZouitine ffbed4a
Enhance training information logging in learner server
AdilZouitine 0341a38
[PORT HIL-SERL] Optimize training loop, extract config usage (#855)
helper2424 787aee0
- Updated the logging condition to use `log_freq` directly instead of…
AdilZouitine 36f9ccd
Add intervention rate tracking in act_with_policy function
AdilZouitine e4a5971
Remove unused functions and imports from modeling_sac.py
AdilZouitine 50d8db4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 618ed00
Initialize log_alpha with the logarithm of temperature_init in SACPolicy
AdilZouitine 42f95e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cdcf346
Update tensor device assignment in ReplayBuffer class
AdilZouitine 1c8daf1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2abbd60
Removed depleted files and scripts
michel-aractingi 0ea2770
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bb5a958
Handle multi optimizers
AdilZouitine 80d566e
Handle new config with sac
AdilZouitine 38e8864
Add task field to frame_dict in ReplayBuffer and simplify save_episod…
AdilZouitine 26ee8b6
Add .devcontainer to .gitignore for improved development environment …
AdilZouitine 114ec64
Change config logic in:
michel-aractingi 056f79d
[WIP] Non functional yet
AdilZouitine 0b5b62c
Add wandb run id in config
AdilZouitine db897a1
[WIP] Update SAC configuration and environment settings
AdilZouitine b69132c
Change HILSerlRobotEnvConfig to inherit from EnvConfig
michel-aractingi 88cc2b8
Add WrapperConfig for environment wrappers and update SACConfig prope…
AdilZouitine 05a237c
Added gripper control mechanism to gym_manipulator
michel-aractingi 5a0ee06
Enhance logging for actor and learner servers
AdilZouitine 8fb373a
fix
AdilZouitine c0ba4b4
Refactor SACConfig properties for improved readability
AdilZouitine 3beab33
Refactor imports in modeling_sac.py for improved organization
AdilZouitine 176557d
Refactor learner_server.py for improved structure and clarity
AdilZouitine eb71064
Refactor actor_server.py for improved structure and logging
AdilZouitine 6e687e2
Refactor SACPolicy and learner_server for improved clarity and functi…
AdilZouitine 4d5ecb0
Refactor SACPolicy for improved type annotations and readability
AdilZouitine 8eb3c15
Added support for controlling the gripper with the pygame interface o…
michel-aractingi eb44a06
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 70d4189
Fix: Prevent Invalid next_state References When optimize_memory=True …
s1lent4gnt 0185a0b
Fix cuda graph break
AdilZouitine 5b49601
Fix convergence of sac, multiple torch compile on the same model caus…
AdilZouitine 334cf81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6669396
Add grasp critic
s1lent4gnt 4277204
Add complementary info in the replay buffer
s1lent4gnt ff18be1
Add gripper penalty wrapper
s1lent4gnt fdd04ef
Add get_gripper_action method to GamepadController
s1lent4gnt 3a2308d
Add grasp critic to the training loop
s1lent4gnt 88d26ae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 0cce2fe
Added Gripper quantization wrapper and grasp penalty
michel-aractingi 7361a11
Refactor SAC configuration and policy to support discrete actions
AdilZouitine f83d215
Refactor SAC policy and training loop to enhance discrete action support
AdilZouitine d86d29f
Add mock gripper support and enhance SAC policy action handling
AdilZouitine f9fb9d4
Refactor SACPolicy for improved readability and action dimension hand…
AdilZouitine 6167886
Enhance SACPolicy and learner server for improved grasp critic integr…
AdilZouitine 70130b9
Enhance SACPolicy to support shared encoder and optimize action selec…
AdilZouitine 7c2c67f
Enhance SAC configuration and replay buffer with asynchronous prefetc…
AdilZouitine cf58890
fix indentation issue
AdilZouitine 1efaf02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8bcf417
fix caching
AdilZouitine d5a87f6
Handle gripper penalty
AdilZouitine 78c640b
Refactor complementary_info handling in ReplayBuffer
AdilZouitine 203315d
fix sign issue
AdilZouitine a3ada81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 68c271a
Add rounding for safety
AdilZouitine e18274b
fix caching and dataset stats is optional
AdilZouitine 02e1ed0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9fd4c21
General fixes in code, removed delta action, fixed grasp penalty, add…
michel-aractingi 28b595c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 267a837
fix encoder training
AdilZouitine 9386892
Refactor modeling_sac and parameter handling for clarity and reusabil…
AdilZouitine 5c352ae
stick to hil serl nn architecture
AdilZouitine 8122721
match target entropy hil serl
AdilZouitine 9e5f254
change the tanh distribution to match hil serl
AdilZouitine 2f7339b
Handle caching
AdilZouitine c5382a4
fix caching
AdilZouitine c37936f
Update log_std_min type to float in PolicyConfig for consistency
AdilZouitine 3424644
Fix init temp
AdilZouitine fb075a7
Refactor input and output normalization handling in SACPolicy for imp…
AdilZouitine 1ce3685
Refactor SACPolicy initialization by breaking down the constructor in…
AdilZouitine dcd850f
Refactor SACObservationEncoder to improve modularity and readability.…
AdilZouitine fb92935
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 54c3c6d
Enhance MLP class in modeling_sac.py with detailed docstring and refa…
AdilZouitine 3b24ad3
Fixes for the reward classifier
michel-aractingi 9886520
Added option to add current readings to the state of the policy
michel-aractingi c1ee25d
nits in configuration classifier and control_robot
michel-aractingi 0d70f0b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a7a51cf
Refactor SACPolicy and configuration to replace 'grasp_critic' termin…
AdilZouitine dc726cb
Refactor crop_dataset_roi
michel-aractingi 0030ff3
[HIL-SERl PORT] Unit tests for Replay Buffer (#966)
helper2424 c5845ee
Fix linter issue
AdilZouitine 6230840
Fix linter issue part 2
AdilZouitine 4ce3362
Fixup linter (#1017)
helper2424 5231752
Fix test comparing uninitialized array segment
AdilZouitine b77cee7
Ignore spellcheck for ik variable
AdilZouitine ecc960b
fix install ci
AdilZouitine cf03ca9
allow to install prerelease for maniskill
AdilZouitine a001824
fix ci
AdilZouitine 299effe
[HIL-SERL] Update CI to allow installation of prerelease versions for…
AdilZouitine 671ac34
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine c58b504
[HIL-SERL]Remove overstrict pre-commit modifications (#1028)
AdilZouitine b8c2b0b
Clean the code and remove todo
AdilZouitine a8da4a3
Clean the code
AdilZouitine bd4db8d
[Port HIl-Serl] Refactor gym-manipulator (#1034)
michel-aractingi 1d4f660
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine 50e9a8e
cleaning
AdilZouitine ea89b29
checkout normalize.py to prev commit
michel-aractingi 4257fe5
rename reward classifier
AdilZouitine fb7c288
Update torch.load calls in network_utils.py to include weights_only=F…
AdilZouitine 6fa7df3
[PORT HIL-SERL] Add unit tests for SAC modeling (#999)
helper2424 5998203
[Port HIL-SERL] Final fixes for reward classifier (#1067)
michel-aractingi d7471a3
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine 4445581
[HIL SERL] Env management and add gym-hil (#1077)
AdilZouitine 3970895
Added missing lisences
michel-aractingi 9a72918
style nit
michel-aractingi 5c0cbb5
Cleaning configs
AdilZouitine 175d21a
Format file
AdilZouitine 410f435
Delete outdated example
AdilZouitine 910805f
added names in `record_dataset` function of gym_manipulator
michel-aractingi 0776f81
robot_type nit
michel-aractingi 98e4394
Add grpcio as optional dependency
AdilZouitine 010dabd
removed fixed port values in `find_joint_limits.py`
michel-aractingi 8fcd32e
Fixes in record_dataset and import gym_hil
michel-aractingi 5f88a6d
Added number of steps after success as parameter in config
michel-aractingi 58b0e1a
Improved the takeover logic in the case of `leader_automatic` control…
michel-aractingi 34c492d
Added comment on SE(3) in kinematics and nits in `lerobot/envs/utils.py`
michel-aractingi bd617c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4a40c5a
Fixup proto header (#1104)
helper2424 6b395fe
[PORT HIL-SERL] Better unit tests coverage for SAC policy (#1074)
helper2424 7de403b
[HIL-SERL] Review feedback modifications (#1112)
AdilZouitine 30d23c6
fix formating and typos
AdilZouitine 92c3eb6
Remove numpy array support
AdilZouitine 307e2bf
Add review feedback
AdilZouitine 2471eda
Add review feedback
AdilZouitine 4e7db92
Add HIL-SERL citation
AdilZouitine 363c6af
Shallow copy
AdilZouitine 2ce275f
- added back degrees mode back to motor bus for IK and FK to work pro…
michel-aractingi 8dad588
Added gamepad teleoperator and so100follower end effector robots
michel-aractingi c3e16f1
precomit nits
michel-aractingi 6f8e869
Modified kinematics code to be independant of drive mode
michel-aractingi 5dbf015
fixed naming convention in gym_manipulator, adapted get observation t…
michel-aractingi 68839e9
precomit nits
michel-aractingi d834d69
Adapted gym_manipulator to teh new convention in robot devices
michel-aractingi 8bb7dd2
General fixes to abide by the new config in learner_server, actor_ser…
michel-aractingi 50df6a0
Moved the step size from the teleop device to the robot; simplified t…
michel-aractingi 9c2d9ca
[PORT HIL-SERL] Refactor folders structure | Rebased version (#1178)
helper2424 7f5e8d5
(fix): linter
AdilZouitine 849f2f3
(fix): test
AdilZouitine d32b2bf
(fix):ReplayBuffer to pass task_name directly to add_frame method; up…
AdilZouitine b497d5f
Fixes in various path of gym_manipulator
michel-aractingi e1977b1
Added hilserl.mdx that contains documentation for training hilserl on…
michel-aractingi f343050
bump gym-hil version to 0.1.5
michel-aractingi efb6c36
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine 5b95e0c
(fix): dependencies
AdilZouitine 93de0bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 85e8e73
(fix): Linting
AdilZouitine 567f379
(fix) linting
AdilZouitine df3151a
Add scipy as dependency
AdilZouitine 0e4a1f8
(fix): scipy dependency
AdilZouitine 4a02f90
- Removed EEActionSpace wrapper that is unused
michel-aractingi 51b93d2
Remame tutorial and tip
AdilZouitine 47ad98d
Seperated sim doc to seperate file
michel-aractingi 830e0e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d141370
iterate on documentation
AdilZouitine 39f852a
Added links to configuration example json files on the hub
michel-aractingi 0ac70e1
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
michel-aractingi b9f666d
chore: move so100 end effector in so100 folder
AdilZouitine b9a7fa7
chore: fixing import in calibrate
AdilZouitine 5656c74
chore: fix test
AdilZouitine 0e38d6b
chore, fixes in gym_manipulator and configuration_so100_follower
michel-aractingi c793bd1
[PORT HIL-SERL] Cover transport with tests (#1231)
helper2424 0783dac
chore: update type hints and remove unused robot config code
AdilZouitine b44b39e
Apply suggestions from code review
AdilZouitine 3eb66ed
(fix): bug in action space when use_gripper is not set
michel-aractingi f9c3a4a
docs: enhance hilserl_sim and hilserl documentation with installation…
AdilZouitine 2b77ff1
chore: remove dead code
AdilZouitine 6a67a70
chore: update copyright year to 2025 and adjust type hints in various…
AdilZouitine 127137e
chore: change copyright year
AdilZouitine 8ad7d9a
Apply suggestions from code review
AdilZouitine a0a9759
Added more typing info for kinematics.py
michel-aractingi e1863dc
Update lerobot/common/teleoperators/gamepad/configuration_gamepad.py
AdilZouitine 3513331
(adressing reviewer) find_joint_limits.py
michel-aractingi 0199a16
add what do I need section
AdilZouitine 4a0c37d
(adressing reviewer) remove mode from tutorial
michel-aractingi 135795a
(adressing reviewer) added link to configurations sac in doc
michel-aractingi 9d734e4
chore: refactor type hints to use union types for optional fields in …
AdilZouitine c4c3650
docs: add optional requirement for a real robot with follower and lea…
AdilZouitine 8a46786
[PORT HIL SERL] Speed up tests (#1253)
helper2424 93988b4
fix: revert intelrealsense dependencies
AdilZouitine d7f035c
(adressing reviewer) added degrees to so101_leader.py
michel-aractingi 9b8ad57
chore: revert the deletion of SO101
AdilZouitine 34f182e
chore: reset observation
AdilZouitine 0bf977b
chore: correct semantics
AdilZouitine eabf401
chore: update test sac config
AdilZouitine 1114fb4
(Adressing reviews): :
michel-aractingi dc60302
(addressing reviews) docstring nit in kinematics
michel-aractingi 1f5e437
refactor: move reward classifier to sac module and update imports
AdilZouitine 063114a
docs: enhance docstring for concatenate_batch_transitions function to…
AdilZouitine 2292a43
(addressing reviews) modified default degrees mode in so101_leader.py
michel-aractingi 1f26fcc
(addressing reviews) find_joint_limits refactor
michel-aractingi b340500
refactor: update configuration class references to TrainRLServerPipel…
AdilZouitine ed11848
(addressing reviews) remove hardcoded path
michel-aractingi 66f7ef2
(addressing reviews) removed unused param
michel-aractingi 1d082d5
docs: expand guidance on selecting regions of interest for visual RL …
AdilZouitine 541f26a
docs: remove redundant installation instructions for gym_hil in hilse…
AdilZouitine 3466e44
(addressing reviews) in teleop_gamepad.py
michel-aractingi dc6f4bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 77260dd
(addressing reviews) changed robot type names in kinematics
michel-aractingi ca468ff
docs: clarify pose_difference_se3 function documentation for SE(3) tr…
AdilZouitine 05ab2e7
refactor: replace hardcoded strings with constants for image and stat…
AdilZouitine d298294
(addressing reviews) fixes in crop_dataset_roi.py
michel-aractingi 098264d
(addressing reviews) nit in hilserl.mdx
michel-aractingi ee51edc
refactor: enhance async iterator in ReplayBuffer for improved error h…
AdilZouitine 9e3d28e
(addressing reviews) fix in table of content hilserl.mdx
michel-aractingi 5a5699d
(doc fixes) fixes in hilserl and hilser_sim section titles
michel-aractingi ad885f5
(doc) added insight to possible tasks wit hilserl
michel-aractingi 12b96b5
(addressing reviews) added constant label for reward
michel-aractingi 8ec04df
refactor: improve readability and structure in kinematics.py by renam…
AdilZouitine fd6da34
(addressing reviews) remove vendor id and product id from gamepad hid
michel-aractingi 0d4b581
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f96c1af
refactor: move random seed fixture to test_sac_policy.py and utilize …
AdilZouitine 22a9c0b
refactor: remove random seed option from pytest configuration and set…
AdilZouitine 322b9ad
(gym-hil) bump to version 0.1.7
michel-aractingi 1bf063a
[HIL SERL] (refactor): replace setup_process_handlers with ProcessSig…
AdilZouitine 8038d4d
(docs) corrected pip install line
michel-aractingi af42f60
Address comments for queues infra (#1266)
helper2424 75f5e9c
(fix test) change queue in test_queue from mp queue
michel-aractingi f1141b2
(docs) added details around hyperparameters and image sizes
michel-aractingi 0977f31
(addressing reviews) nits in gym_manipulator and configs
michel-aractingi 898820a
(docstrings) removed outdated comments in docstrings
michel-aractingi 1285ba4
(bump) gym-hil version to 0.1.8
michel-aractingi a914382
(docs) updated main hilserl docs
michel-aractingi 4a8d2e7
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
aliberts File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ outputs | |
|
|
||
| # VS Code | ||
| .vscode | ||
| .devcontainer | ||
|
|
||
| # HPC | ||
| nautilus/*.yaml | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aliberts marked this conversation as resolved.
Show resolved
Hide resolved
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # Train RL in Simulation | ||
|
|
||
| This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning. | ||
|
|
||
| `gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to: | ||
|
|
||
| - Train policies in simulation to test the RL stack before training on real robots | ||
|
|
||
| - Collect demonstrations in sim using external devices like gamepads or keyboards | ||
| - Perform human interventions during policy learning | ||
|
|
||
| Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube. | ||
|
|
||
|
|
||
| ## Installation | ||
aliberts marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| First, install the `gym_hil` package within the LeRobot environment: | ||
|
|
||
| ```bash | ||
| pip install gym_hil | ||
imstevenpmwork marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Or in LeRobot | ||
| cd lerobot | ||
| pip install -e .[hilserl] | ||
| ``` | ||
|
|
||
| # What do I need? | ||
|
|
||
| - A gamepad or keyboard to control the robot | ||
| - A Nvidia GPU | ||
|
|
||
|
|
||
|
|
||
| ## Configuration | ||
|
|
||
| To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/gym_hil_env.json). Key configuration sections include: | ||
|
|
||
| ### Environment Type and Task | ||
|
|
||
| ```json | ||
| { | ||
| "type": "hil", | ||
| "name": "franka_sim", | ||
| "task": "PandaPickCubeGamepad-v0", | ||
| "device": "cuda" | ||
| } | ||
| ``` | ||
|
|
||
| Available tasks: | ||
| - `PandaPickCubeBase-v0`: Basic environment | ||
| - `PandaPickCubeGamepad-v0`: With gamepad control | ||
| - `PandaPickCubeKeyboard-v0`: With keyboard control | ||
|
|
||
| ### Gym Wrappers Configuration | ||
|
|
||
| ```json | ||
| "wrapper": { | ||
| "gripper_penalty": -0.02, | ||
| "control_time_s": 15.0, | ||
| "use_gripper": true, | ||
| "fixed_reset_joint_positions": [0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785], | ||
| "end_effector_step_sizes": { | ||
| "x": 0.025, | ||
| "y": 0.025, | ||
| "z": 0.025 | ||
| }, | ||
| "control_mode": "gamepad" | ||
| } | ||
| ``` | ||
|
|
||
| Important parameters: | ||
| - `gripper_penalty`: Penalty for excessive gripper movement | ||
| - `use_gripper`: Whether to enable gripper control | ||
aliberts marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - `end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector | ||
| - `control_mode`: Set to `"gamepad"` to use a gamepad controller | ||
|
|
||
| ## Running with HIL RL of LeRobot | ||
|
|
||
| ### Basic Usage | ||
|
|
||
| To run the environment, set mode to null: | ||
|
|
||
| ```python | ||
| python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json | ||
| ``` | ||
|
|
||
| ### Recording a Dataset | ||
|
|
||
| To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record: | ||
|
|
||
| ```python | ||
| python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json | ||
| ``` | ||
|
|
||
| ### Training a Policy | ||
|
|
||
| To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_gym_hil_env.json) and run the actor and learner servers: | ||
|
|
||
| ```python | ||
| python lerobot/scripts/rl/actor.py --config_path path/to/train_gym_hil_env.json | ||
| ``` | ||
|
|
||
| In a different terminal, run the learner server: | ||
|
|
||
| ```python | ||
| python lerobot/scripts/rl/learner.py --config_path path/to/train_gym_hil_env.json | ||
| ``` | ||
|
|
||
| The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots. | ||
|
|
||
| Congrats 🎉, you have finished this tutorial! | ||
|
|
||
| > [!TIP] | ||
| > If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb). | ||
| Paper citation: | ||
| ``` | ||
| @article{luo2024precise, | ||
| title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning}, | ||
| author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey}, | ||
| journal={arXiv preprint arXiv:2410.21845}, | ||
| year={2024} | ||
| } | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move rl stuff out of the Tutorial section and just create a new section called
policiesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt @pkooij?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can refactor/reorganize the docs in an upcoming dedicated PR if not now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied on discord, to summarize: I think the rl documents are more like tutorials for now because of their length and because we only have one RL algorithm now. But maybe we can place the description of the algorithm under a section called policies.