Skip to content
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
312 commits
Select commit Hold shift + click to select a range
a90f487
Add maniskill support.
AdilZouitine Feb 14, 2025
c85f88f
Improve wandb logging and custom step tracking in logger
AdilZouitine Feb 17, 2025
62e237b
Re-enable parameter push thread in learner server
AdilZouitine Feb 17, 2025
0d88a5e
- Fixed big issue in the loading of the policy parameters sent by the…
michel-aractingi Feb 19, 2025
85242ca
Refactor SAC policy with performance optimizations and multi-camera s…
AdilZouitine Feb 20, 2025
e1d55c7
[Port HIL-SERL] Adjust Actor-Learner architecture & clean up dependen…
helper2424 Feb 21, 2025
d3b84ec
Added caching function in the learner_server and modeling sac in orde…
michel-aractingi Feb 21, 2025
4c73891
Update ManiSkill configuration and replay buffer to support truncatio…
AdilZouitine Feb 24, 2025
1d4ec50
Refactor ReplayBuffer with tensor-based storage and improved sampling…
AdilZouitine Feb 25, 2025
9ea79f8
Add storage device parameter to replay buffer initialization
AdilZouitine Feb 25, 2025
ae51c19
Add memory optimization option to ReplayBuffer
AdilZouitine Feb 25, 2025
bb69cb3
Add storage device configuration for SAC policy and replay buffer
AdilZouitine Mar 4, 2025
85fe8a3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 4, 2025
b6a2200
[HIL-SERL] Migrate threading to multiprocessing (#759)
helper2424 Mar 5, 2025
3dfb37e
[Port HIL-SERL] Balanced sampler function speed up and refactor to al…
s1lent4gnt Mar 12, 2025
e002c5e
Remove torch.no_grad decorator and optimize next action prediction in…
AdilZouitine Mar 10, 2025
2f04d0d
Add custom save and load methods for SAC policy
AdilZouitine Mar 12, 2025
5993265
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 12, 2025
66816fd
Enhance SAC configuration and policy with gradient clipping and tempe…
AdilZouitine Mar 17, 2025
7b01e16
Add end effector action space to hil-serl (#861)
michel-aractingi Mar 17, 2025
0959694
Refactor SACPolicy and learner server for improved replay buffer mana…
AdilZouitine Mar 18, 2025
fd74c19
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 18, 2025
03fe0f0
Update configuration files for improved performance and flexibility
AdilZouitine Mar 19, 2025
ffbed4a
Enhance training information logging in learner server
AdilZouitine Mar 19, 2025
0341a38
[PORT HIL-SERL] Optimize training loop, extract config usage (#855)
helper2424 Mar 19, 2025
787aee0
- Updated the logging condition to use `log_freq` directly instead of…
AdilZouitine Mar 19, 2025
36f9ccd
Add intervention rate tracking in act_with_policy function
AdilZouitine Mar 19, 2025
e4a5971
Remove unused functions and imports from modeling_sac.py
AdilZouitine Mar 19, 2025
50d8db4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 19, 2025
618ed00
Initialize log_alpha with the logarithm of temperature_init in SACPolicy
AdilZouitine Mar 20, 2025
42f95e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 20, 2025
cdcf346
Update tensor device assignment in ReplayBuffer class
AdilZouitine Mar 21, 2025
1c8daf1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 24, 2025
2abbd60
Removed depleted files and scripts
michel-aractingi Mar 24, 2025
0ea2770
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 24, 2025
bb5a958
Handle multi optimizers
AdilZouitine Mar 24, 2025
80d566e
Handle new config with sac
AdilZouitine Mar 24, 2025
38e8864
Add task field to frame_dict in ReplayBuffer and simplify save_episod…
AdilZouitine Mar 24, 2025
26ee8b6
Add .devcontainer to .gitignore for improved development environment …
AdilZouitine Mar 25, 2025
114ec64
Change config logic in:
michel-aractingi Mar 25, 2025
056f79d
[WIP] Non functional yet
AdilZouitine Mar 26, 2025
0b5b62c
Add wandb run id in config
AdilZouitine Mar 27, 2025
db897a1
[WIP] Update SAC configuration and environment settings
AdilZouitine Mar 27, 2025
b69132c
Change HILSerlRobotEnvConfig to inherit from EnvConfig
michel-aractingi Mar 27, 2025
88cc2b8
Add WrapperConfig for environment wrappers and update SACConfig prope…
AdilZouitine Mar 27, 2025
05a237c
Added gripper control mechanism to gym_manipulator
michel-aractingi Mar 28, 2025
5a0ee06
Enhance logging for actor and learner servers
AdilZouitine Mar 28, 2025
8fb373a
fix
AdilZouitine Mar 28, 2025
c0ba4b4
Refactor SACConfig properties for improved readability
AdilZouitine Mar 28, 2025
3beab33
Refactor imports in modeling_sac.py for improved organization
AdilZouitine Mar 28, 2025
176557d
Refactor learner_server.py for improved structure and clarity
AdilZouitine Mar 28, 2025
eb71064
Refactor actor_server.py for improved structure and logging
AdilZouitine Mar 28, 2025
6e687e2
Refactor SACPolicy and learner_server for improved clarity and functi…
AdilZouitine Mar 28, 2025
4d5ecb0
Refactor SACPolicy for improved type annotations and readability
AdilZouitine Mar 28, 2025
8eb3c15
Added support for controlling the gripper with the pygame interface o…
michel-aractingi Mar 28, 2025
eb44a06
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2025
70d4189
Fix: Prevent Invalid next_state References When optimize_memory=True …
s1lent4gnt Mar 31, 2025
0185a0b
Fix cuda graph break
AdilZouitine Mar 31, 2025
5b49601
Fix convergence of sac, multiple torch compile on the same model caus…
AdilZouitine Mar 31, 2025
334cf81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 31, 2025
6669396
Add grasp critic
s1lent4gnt Mar 31, 2025
4277204
Add complementary info in the replay buffer
s1lent4gnt Mar 31, 2025
ff18be1
Add gripper penalty wrapper
s1lent4gnt Mar 31, 2025
fdd04ef
Add get_gripper_action method to GamepadController
s1lent4gnt Mar 31, 2025
3a2308d
Add grasp critic to the training loop
s1lent4gnt Mar 31, 2025
88d26ae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 31, 2025
0cce2fe
Added Gripper quantization wrapper and grasp penalty
michel-aractingi Apr 1, 2025
7361a11
Refactor SAC configuration and policy to support discrete actions
AdilZouitine Apr 1, 2025
f83d215
Refactor SAC policy and training loop to enhance discrete action support
AdilZouitine Apr 1, 2025
d86d29f
Add mock gripper support and enhance SAC policy action handling
AdilZouitine Apr 1, 2025
f9fb9d4
Refactor SACPolicy for improved readability and action dimension hand…
AdilZouitine Apr 1, 2025
6167886
Enhance SACPolicy and learner server for improved grasp critic integr…
AdilZouitine Apr 2, 2025
70130b9
Enhance SACPolicy to support shared encoder and optimize action selec…
AdilZouitine Apr 3, 2025
7c2c67f
Enhance SAC configuration and replay buffer with asynchronous prefetc…
AdilZouitine Apr 3, 2025
cf58890
fix indentation issue
AdilZouitine Apr 3, 2025
1efaf02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 4, 2025
8bcf417
fix caching
AdilZouitine Apr 4, 2025
d5a87f6
Handle gripper penalty
AdilZouitine Apr 7, 2025
78c640b
Refactor complementary_info handling in ReplayBuffer
AdilZouitine Apr 7, 2025
203315d
fix sign issue
AdilZouitine Apr 7, 2025
a3ada81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 7, 2025
68c271a
Add rounding for safety
AdilZouitine Apr 8, 2025
e18274b
fix caching and dataset stats is optional
AdilZouitine Apr 9, 2025
02e1ed0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2025
9fd4c21
General fixes in code, removed delta action, fixed grasp penalty, add…
michel-aractingi Apr 9, 2025
28b595c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2025
267a837
fix encoder training
AdilZouitine Apr 11, 2025
9386892
Refactor modeling_sac and parameter handling for clarity and reusabil…
AdilZouitine Apr 14, 2025
5c352ae
stick to hil serl nn architecture
AdilZouitine Apr 15, 2025
8122721
match target entropy hil serl
AdilZouitine Apr 15, 2025
9e5f254
change the tanh distribution to match hil serl
AdilZouitine Apr 15, 2025
2f7339b
Handle caching
AdilZouitine Apr 15, 2025
c5382a4
fix caching
AdilZouitine Apr 15, 2025
c37936f
Update log_std_min type to float in PolicyConfig for consistency
AdilZouitine Apr 15, 2025
3424644
Fix init temp
AdilZouitine Apr 16, 2025
fb075a7
Refactor input and output normalization handling in SACPolicy for imp…
AdilZouitine Apr 17, 2025
1ce3685
Refactor SACPolicy initialization by breaking down the constructor in…
AdilZouitine Apr 17, 2025
dcd850f
Refactor SACObservationEncoder to improve modularity and readability.…
AdilZouitine Apr 18, 2025
fb92935
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2025
54c3c6d
Enhance MLP class in modeling_sac.py with detailed docstring and refa…
AdilZouitine Apr 18, 2025
3b24ad3
Fixes for the reward classifier
michel-aractingi Apr 15, 2025
9886520
Added option to add current readings to the state of the policy
michel-aractingi Apr 15, 2025
c1ee25d
nits in configuration classifier and control_robot
michel-aractingi Apr 18, 2025
0d70f0b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2025
a7a51cf
Refactor SACPolicy and configuration to replace 'grasp_critic' termin…
AdilZouitine Apr 18, 2025
dc726cb
Refactor crop_dataset_roi
michel-aractingi Apr 22, 2025
0030ff3
[HIL-SERl PORT] Unit tests for Replay Buffer (#966)
helper2424 Apr 22, 2025
c5845ee
Fix linter issue
AdilZouitine Apr 22, 2025
6230840
Fix linter issue part 2
AdilZouitine Apr 22, 2025
4ce3362
Fixup linter (#1017)
helper2424 Apr 22, 2025
5231752
Fix test comparing uninitialized array segment
AdilZouitine Apr 22, 2025
b77cee7
Ignore spellcheck for ik variable
AdilZouitine Apr 22, 2025
ecc960b
fix install ci
AdilZouitine Apr 22, 2025
cf03ca9
allow to install prerelease for maniskill
AdilZouitine Apr 22, 2025
a001824
fix ci
AdilZouitine Apr 22, 2025
299effe
[HIL-SERL] Update CI to allow installation of prerelease versions for…
AdilZouitine Apr 24, 2025
671ac34
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine Apr 24, 2025
c58b504
[HIL-SERL]Remove overstrict pre-commit modifications (#1028)
AdilZouitine Apr 24, 2025
b8c2b0b
Clean the code and remove todo
AdilZouitine Apr 24, 2025
a8da4a3
Clean the code
AdilZouitine Apr 24, 2025
bd4db8d
[Port HIl-Serl] Refactor gym-manipulator (#1034)
michel-aractingi Apr 25, 2025
1d4f660
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine Apr 25, 2025
50e9a8e
cleaning
AdilZouitine Apr 25, 2025
ea89b29
checkout normalize.py to prev commit
michel-aractingi Apr 25, 2025
4257fe5
rename reward classifier
AdilZouitine Apr 25, 2025
fb7c288
Update torch.load calls in network_utils.py to include weights_only=F…
AdilZouitine Apr 29, 2025
6fa7df3
[PORT HIL-SERL] Add unit tests for SAC modeling (#999)
helper2424 May 5, 2025
5998203
[Port HIL-SERL] Final fixes for reward classifier (#1067)
michel-aractingi May 5, 2025
d7471a3
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine May 6, 2025
4445581
[HIL SERL] Env management and add gym-hil (#1077)
AdilZouitine May 7, 2025
3970895
Added missing lisences
michel-aractingi May 7, 2025
9a72918
style nit
michel-aractingi May 7, 2025
5c0cbb5
Cleaning configs
AdilZouitine May 7, 2025
175d21a
Format file
AdilZouitine May 7, 2025
410f435
Delete outdated example
AdilZouitine May 7, 2025
910805f
added names in `record_dataset` function of gym_manipulator
michel-aractingi May 7, 2025
0776f81
robot_type nit
michel-aractingi May 7, 2025
98e4394
Add grpcio as optional dependency
AdilZouitine May 7, 2025
010dabd
removed fixed port values in `find_joint_limits.py`
michel-aractingi May 7, 2025
8fcd32e
Fixes in record_dataset and import gym_hil
michel-aractingi May 9, 2025
5f88a6d
Added number of steps after success as parameter in config
michel-aractingi May 9, 2025
58b0e1a
Improved the takeover logic in the case of `leader_automatic` control…
michel-aractingi May 12, 2025
34c492d
Added comment on SE(3) in kinematics and nits in `lerobot/envs/utils.py`
michel-aractingi May 12, 2025
bd617c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 12, 2025
4a40c5a
Fixup proto header (#1104)
helper2424 May 13, 2025
6b395fe
[PORT HIL-SERL] Better unit tests coverage for SAC policy (#1074)
helper2424 May 14, 2025
7de403b
[HIL-SERL] Review feedback modifications (#1112)
AdilZouitine May 15, 2025
30d23c6
fix formating and typos
AdilZouitine May 15, 2025
92c3eb6
Remove numpy array support
AdilZouitine May 15, 2025
307e2bf
Add review feedback
AdilZouitine May 16, 2025
2471eda
Add review feedback
AdilZouitine May 16, 2025
4e7db92
Add HIL-SERL citation
AdilZouitine May 16, 2025
363c6af
Shallow copy
AdilZouitine May 16, 2025
2ce275f
- added back degrees mode back to motor bus for IK and FK to work pro…
michel-aractingi May 20, 2025
8dad588
Added gamepad teleoperator and so100follower end effector robots
michel-aractingi May 20, 2025
c3e16f1
precomit nits
michel-aractingi May 20, 2025
6f8e869
Modified kinematics code to be independant of drive mode
michel-aractingi May 23, 2025
5dbf015
fixed naming convention in gym_manipulator, adapted get observation t…
michel-aractingi May 23, 2025
68839e9
precomit nits
michel-aractingi May 23, 2025
d834d69
Adapted gym_manipulator to teh new convention in robot devices
michel-aractingi May 26, 2025
8bb7dd2
General fixes to abide by the new config in learner_server, actor_ser…
michel-aractingi May 27, 2025
50df6a0
Moved the step size from the teleop device to the robot; simplified t…
michel-aractingi May 28, 2025
9c2d9ca
[PORT HIL-SERL] Refactor folders structure | Rebased version (#1178)
helper2424 Jun 2, 2025
7f5e8d5
(fix): linter
AdilZouitine Jun 3, 2025
849f2f3
(fix): test
AdilZouitine Jun 3, 2025
d32b2bf
(fix):ReplayBuffer to pass task_name directly to add_frame method; up…
AdilZouitine Jun 4, 2025
b497d5f
Fixes in various path of gym_manipulator
michel-aractingi Jun 2, 2025
e1977b1
Added hilserl.mdx that contains documentation for training hilserl on…
michel-aractingi Jun 4, 2025
f343050
bump gym-hil version to 0.1.5
michel-aractingi Jun 5, 2025
efb6c36
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
AdilZouitine Jun 6, 2025
5b95e0c
(fix): dependencies
AdilZouitine Jun 6, 2025
93de0bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2025
85e8e73
(fix): Linting
AdilZouitine Jun 6, 2025
567f379
(fix) linting
AdilZouitine Jun 6, 2025
df3151a
Add scipy as dependency
AdilZouitine Jun 6, 2025
0e4a1f8
(fix): scipy dependency
AdilZouitine Jun 6, 2025
4a02f90
- Removed EEActionSpace wrapper that is unused
michel-aractingi Jun 6, 2025
51b93d2
Remame tutorial and tip
AdilZouitine Jun 6, 2025
47ad98d
Seperated sim doc to seperate file
michel-aractingi Jun 6, 2025
830e0e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2025
d141370
iterate on documentation
AdilZouitine Jun 6, 2025
39f852a
Added links to configuration example json files on the hub
michel-aractingi Jun 6, 2025
0ac70e1
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
michel-aractingi Jun 6, 2025
b9f666d
chore: move so100 end effector in so100 folder
AdilZouitine Jun 9, 2025
b9a7fa7
chore: fixing import in calibrate
AdilZouitine Jun 9, 2025
5656c74
chore: fix test
AdilZouitine Jun 9, 2025
0e38d6b
chore, fixes in gym_manipulator and configuration_so100_follower
michel-aractingi Jun 9, 2025
c793bd1
[PORT HIL-SERL] Cover transport with tests (#1231)
helper2424 Jun 9, 2025
0783dac
chore: update type hints and remove unused robot config code
AdilZouitine Jun 10, 2025
b44b39e
Apply suggestions from code review
AdilZouitine Jun 10, 2025
3eb66ed
(fix): bug in action space when use_gripper is not set
michel-aractingi Jun 10, 2025
f9c3a4a
docs: enhance hilserl_sim and hilserl documentation with installation…
AdilZouitine Jun 10, 2025
2b77ff1
chore: remove dead code
AdilZouitine Jun 10, 2025
6a67a70
chore: update copyright year to 2025 and adjust type hints in various…
AdilZouitine Jun 10, 2025
127137e
chore: change copyright year
AdilZouitine Jun 10, 2025
8ad7d9a
Apply suggestions from code review
AdilZouitine Jun 10, 2025
a0a9759
Added more typing info for kinematics.py
michel-aractingi Jun 10, 2025
e1863dc
Update lerobot/common/teleoperators/gamepad/configuration_gamepad.py
AdilZouitine Jun 10, 2025
3513331
(adressing reviewer) find_joint_limits.py
michel-aractingi Jun 10, 2025
0199a16
add what do I need section
AdilZouitine Jun 10, 2025
4a0c37d
(adressing reviewer) remove mode from tutorial
michel-aractingi Jun 10, 2025
135795a
(adressing reviewer) added link to configurations sac in doc
michel-aractingi Jun 10, 2025
9d734e4
chore: refactor type hints to use union types for optional fields in …
AdilZouitine Jun 10, 2025
c4c3650
docs: add optional requirement for a real robot with follower and lea…
AdilZouitine Jun 10, 2025
8a46786
[PORT HIL SERL] Speed up tests (#1253)
helper2424 Jun 10, 2025
93988b4
fix: revert intelrealsense dependencies
AdilZouitine Jun 10, 2025
d7f035c
(adressing reviewer) added degrees to so101_leader.py
michel-aractingi Jun 10, 2025
9b8ad57
chore: revert the deletion of SO101
AdilZouitine Jun 10, 2025
34f182e
chore: reset observation
AdilZouitine Jun 10, 2025
0bf977b
chore: correct semantics
AdilZouitine Jun 10, 2025
eabf401
chore: update test sac config
AdilZouitine Jun 10, 2025
1114fb4
(Adressing reviews): :
michel-aractingi Jun 10, 2025
dc60302
(addressing reviews) docstring nit in kinematics
michel-aractingi Jun 10, 2025
1f5e437
refactor: move reward classifier to sac module and update imports
AdilZouitine Jun 10, 2025
063114a
docs: enhance docstring for concatenate_batch_transitions function to…
AdilZouitine Jun 10, 2025
2292a43
(addressing reviews) modified default degrees mode in so101_leader.py
michel-aractingi Jun 10, 2025
1f26fcc
(addressing reviews) find_joint_limits refactor
michel-aractingi Jun 10, 2025
b340500
refactor: update configuration class references to TrainRLServerPipel…
AdilZouitine Jun 10, 2025
ed11848
(addressing reviews) remove hardcoded path
michel-aractingi Jun 10, 2025
66f7ef2
(addressing reviews) removed unused param
michel-aractingi Jun 10, 2025
1d082d5
docs: expand guidance on selecting regions of interest for visual RL …
AdilZouitine Jun 10, 2025
541f26a
docs: remove redundant installation instructions for gym_hil in hilse…
AdilZouitine Jun 10, 2025
3466e44
(addressing reviews) in teleop_gamepad.py
michel-aractingi Jun 10, 2025
dc6f4bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 10, 2025
77260dd
(addressing reviews) changed robot type names in kinematics
michel-aractingi Jun 10, 2025
ca468ff
docs: clarify pose_difference_se3 function documentation for SE(3) tr…
AdilZouitine Jun 11, 2025
05ab2e7
refactor: replace hardcoded strings with constants for image and stat…
AdilZouitine Jun 11, 2025
d298294
(addressing reviews) fixes in crop_dataset_roi.py
michel-aractingi Jun 11, 2025
098264d
(addressing reviews) nit in hilserl.mdx
michel-aractingi Jun 11, 2025
ee51edc
refactor: enhance async iterator in ReplayBuffer for improved error h…
AdilZouitine Jun 11, 2025
9e3d28e
(addressing reviews) fix in table of content hilserl.mdx
michel-aractingi Jun 11, 2025
5a5699d
(doc fixes) fixes in hilserl and hilser_sim section titles
michel-aractingi Jun 11, 2025
ad885f5
(doc) added insight to possible tasks wit hilserl
michel-aractingi Jun 11, 2025
12b96b5
(addressing reviews) added constant label for reward
michel-aractingi Jun 11, 2025
8ec04df
refactor: improve readability and structure in kinematics.py by renam…
AdilZouitine Jun 11, 2025
fd6da34
(addressing reviews) remove vendor id and product id from gamepad hid
michel-aractingi Jun 11, 2025
0d4b581
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 11, 2025
f96c1af
refactor: move random seed fixture to test_sac_policy.py and utilize …
AdilZouitine Jun 11, 2025
22a9c0b
refactor: remove random seed option from pytest configuration and set…
AdilZouitine Jun 11, 2025
322b9ad
(gym-hil) bump to version 0.1.7
michel-aractingi Jun 11, 2025
1bf063a
[HIL SERL] (refactor): replace setup_process_handlers with ProcessSig…
AdilZouitine Jun 11, 2025
8038d4d
(docs) corrected pip install line
michel-aractingi Jun 12, 2025
af42f60
Address comments for queues infra (#1266)
helper2424 Jun 12, 2025
75f5e9c
(fix test) change queue in test_queue from mp queue
michel-aractingi Jun 12, 2025
f1141b2
(docs) added details around hyperparameters and image sizes
michel-aractingi Jun 12, 2025
0977f31
(addressing reviews) nits in gym_manipulator and configs
michel-aractingi Jun 12, 2025
898820a
(docstrings) removed outdated comments in docstrings
michel-aractingi Jun 12, 2025
1285ba4
(bump) gym-hil version to 0.1.8
michel-aractingi Jun 12, 2025
a914382
(docs) updated main hilserl docs
michel-aractingi Jun 13, 2025
4a8d2e7
Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new
aliberts Jun 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ outputs

# VS Code
.vscode
.devcontainer

# HPC
nautilus/*.yaml
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,19 @@ Additionally, if you are using any of the particular policy architecture, pretra
year={2024}
}
```


- [HIL-SERL](https://hil-serl.github.io/)
```bibtex
@Article{luo2024hilserl,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Jianlan Luo and Charles Xu and Jeffrey Wu and Sergey Levine},
year={2024},
eprint={2410.21845},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
```
## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=huggingface/lerobot&type=Timeline)](https://star-history.com/#huggingface/lerobot&Timeline)
4 changes: 4 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@
title: Getting Started with Real-World Robots
- local: cameras
title: Cameras
- local: hilserl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move rl stuff out of the Tutorial section and just create a new section called policies

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt @pkooij?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor/reorganize the docs in an upcoming dedicated PR if not now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied on discord, to summarize: I think the rl documents are more like tutorials for now because of their length and because we only have one RL algorithm now. But maybe we can place the description of the algorithm under a section called policies.

title: Train a Robot with RL
- local: hilserl_sim
title: Train RL in Simulation
title: "Tutorials"
- sections:
- local: so101
Expand Down
432 changes: 432 additions & 0 deletions docs/source/hilserl.mdx

Large diffs are not rendered by default.

124 changes: 124 additions & 0 deletions docs/source/hilserl_sim.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Train RL in Simulation

This guide explains how to use the `gym_hil` simulation environments as an alternative to real robots when working with the LeRobot framework for Human-In-the-Loop (HIL) reinforcement learning.

`gym_hil` is a package that provides Gymnasium-compatible simulation environments specifically designed for Human-In-the-Loop reinforcement learning. These environments allow you to:

- Train policies in simulation to test the RL stack before training on real robots

- Collect demonstrations in sim using external devices like gamepads or keyboards
- Perform human interventions during policy learning

Currently, the main environment is a Franka Panda robot simulation based on MuJoCo, with tasks like picking up a cube.


## Installation

First, install the `gym_hil` package within the LeRobot environment:

```bash
pip install gym_hil

# Or in LeRobot
cd lerobot
pip install -e .[hilserl]
```

# What do I need?

- A gamepad or keyboard to control the robot
- A Nvidia GPU



## Configuration

To use `gym_hil` with LeRobot, you need to create a configuration file. An example is provided [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/gym_hil_env.json). Key configuration sections include:

### Environment Type and Task

```json
{
"type": "hil",
"name": "franka_sim",
"task": "PandaPickCubeGamepad-v0",
"device": "cuda"
}
```

Available tasks:
- `PandaPickCubeBase-v0`: Basic environment
- `PandaPickCubeGamepad-v0`: With gamepad control
- `PandaPickCubeKeyboard-v0`: With keyboard control

### Gym Wrappers Configuration

```json
"wrapper": {
"gripper_penalty": -0.02,
"control_time_s": 15.0,
"use_gripper": true,
"fixed_reset_joint_positions": [0.0, 0.195, 0.0, -2.43, 0.0, 2.62, 0.785],
"end_effector_step_sizes": {
"x": 0.025,
"y": 0.025,
"z": 0.025
},
"control_mode": "gamepad"
}
```

Important parameters:
- `gripper_penalty`: Penalty for excessive gripper movement
- `use_gripper`: Whether to enable gripper control
- `end_effector_step_sizes`: Size of the steps in the x,y,z axes of the end-effector
- `control_mode`: Set to `"gamepad"` to use a gamepad controller

## Running with HIL RL of LeRobot

### Basic Usage

To run the environment, set mode to null:

```python
python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
```

### Recording a Dataset

To collect a dataset, set the mode to `record` whilst defining the repo_id and number of episodes to record:

```python
python lerobot/scripts/rl/gym_manipulator.py --config_path path/to/gym_hil_env.json
```

### Training a Policy

To train a policy, checkout the configuration example available [here](https://huggingface.co/datasets/aractingi/lerobot-example-config-files/blob/main/train_gym_hil_env.json) and run the actor and learner servers:

```python
python lerobot/scripts/rl/actor.py --config_path path/to/train_gym_hil_env.json
```

In a different terminal, run the learner server:

```python
python lerobot/scripts/rl/learner.py --config_path path/to/train_gym_hil_env.json
```

The simulation environment provides a safe and repeatable way to develop and test your Human-In-the-Loop reinforcement learning components before deploying to real robots.

Congrats 🎉, you have finished this tutorial!

> [!TIP]
> If you have any questions or need help, please reach out on [Discord](https://discord.com/invite/s3KuuzsPFb).
Paper citation:
```
@article{luo2024precise,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Luo, Jianlan and Xu, Charles and Wu, Jeffrey and Levine, Sergey},
journal={arXiv preprint arXiv:2410.21845},
year={2024}
}
```
115 changes: 115 additions & 0 deletions lerobot/common/envs/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,13 @@

import abc
from dataclasses import dataclass, field
from typing import Any, Optional

import draccus

from lerobot.common.constants import ACTION, OBS_ENV_STATE, OBS_IMAGE, OBS_IMAGES, OBS_STATE
from lerobot.common.robots import RobotConfig
from lerobot.common.teleoperators.config import TeleoperatorConfig
from lerobot.configs.types import FeatureType, PolicyFeature


Expand Down Expand Up @@ -155,3 +158,115 @@ def gym_kwargs(self) -> dict:
"visualization_height": self.visualization_height,
"max_episode_steps": self.episode_length,
}


@dataclass
class VideoRecordConfig:
"""Configuration for video recording in ManiSkill environments."""

enabled: bool = False
record_dir: str = "videos"
trajectory_name: str = "trajectory"


@dataclass
class EnvTransformConfig:
"""Configuration for environment wrappers."""

# ee_action_space_params: EEActionSpaceConfig = field(default_factory=EEActionSpaceConfig)
control_mode: str = "gamepad"
display_cameras: bool = False
add_joint_velocity_to_observation: bool = False
add_current_to_observation: bool = False
add_ee_pose_to_observation: bool = False
crop_params_dict: Optional[dict[str, tuple[int, int, int, int]]] = None
resize_size: Optional[tuple[int, int]] = None
control_time_s: float = 20.0
fixed_reset_joint_positions: Optional[Any] = None
reset_time_s: float = 5.0
use_gripper: bool = True
gripper_quantization_threshold: float | None = 0.8
gripper_penalty: float = 0.0
gripper_penalty_in_reward: bool = False
number_of_steps_after_success: int = 0


@EnvConfig.register_subclass(name="gym_manipulator")
@dataclass
class HILSerlRobotEnvConfig(EnvConfig):
"""Configuration for the HILSerlRobotEnv environment."""

robot: Optional[RobotConfig] = None
teleop: Optional[TeleoperatorConfig] = None
wrapper: Optional[EnvTransformConfig] = None
fps: int = 10
name: str = "real_robot"
mode: str = None # Either "record", "replay", None
repo_id: Optional[str] = None
dataset_root: Optional[str] = None
task: str = ""
num_episodes: int = 10 # only for record mode
episode: int = 0
device: str = "cuda"
push_to_hub: bool = True
pretrained_policy_name_or_path: Optional[str] = None
reward_classifier_pretrained_path: Optional[str] = None
# For the reward classifier, to record more positive examples after a success
number_of_steps_after_success: int = 0

def gym_kwargs(self) -> dict:
return {}


@EnvConfig.register_subclass("hil")
@dataclass
class HILEnvConfig(EnvConfig):
"""Configuration for the HIL environment."""

type: str = "hil"
name: str = "PandaPickCube"
task: str = "PandaPickCubeKeyboard-v0"
use_viewer: bool = True
gripper_penalty: float = 0.0
use_gamepad: bool = True
state_dim: int = 18
action_dim: int = 4
fps: int = 100
episode_length: int = 100
video_record: VideoRecordConfig = field(default_factory=VideoRecordConfig)
features: dict[str, PolicyFeature] = field(
default_factory=lambda: {
"action": PolicyFeature(type=FeatureType.ACTION, shape=(4,)),
"observation.image": PolicyFeature(type=FeatureType.VISUAL, shape=(3, 128, 128)),
"observation.state": PolicyFeature(type=FeatureType.STATE, shape=(18,)),
}
)
features_map: dict[str, str] = field(
default_factory=lambda: {
"action": ACTION,
"observation.image": OBS_IMAGE,
"observation.state": OBS_STATE,
}
)
################# args from hilserlrobotenv
reward_classifier_pretrained_path: Optional[str] = None
robot_config: Optional[RobotConfig] = None
teleop_config: Optional[TeleoperatorConfig] = None
wrapper: Optional[EnvTransformConfig] = None
mode: str = None # Either "record", "replay", None
repo_id: Optional[str] = None
dataset_root: Optional[str] = None
num_episodes: int = 10 # only for record mode
episode: int = 0
device: str = "cuda"
push_to_hub: bool = True
pretrained_policy_name_or_path: Optional[str] = None
############################

@property
def gym_kwargs(self) -> dict:
return {
"use_viewer": self.use_viewer,
"use_gamepad": self.use_gamepad,
"gripper_penalty": self.gripper_penalty,
}
4 changes: 3 additions & 1 deletion lerobot/common/envs/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

import gymnasium as gym

from lerobot.common.envs.configs import AlohaEnv, EnvConfig, PushtEnv, XarmEnv
from lerobot.common.envs.configs import AlohaEnv, EnvConfig, HILEnvConfig, PushtEnv, XarmEnv


def make_env_config(env_type: str, **kwargs) -> EnvConfig:
Expand All @@ -27,6 +27,8 @@ def make_env_config(env_type: str, **kwargs) -> EnvConfig:
return PushtEnv(**kwargs)
elif env_type == "xarm":
return XarmEnv(**kwargs)
elif env_type == "hil":
return HILEnvConfig(**kwargs)
else:
raise ValueError(f"Policy type '{env_type}' is not available.")

Expand Down
19 changes: 14 additions & 5 deletions lerobot/common/envs/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Ten
# TODO(aliberts, rcadene): use transforms.ToTensor()?
img = torch.from_numpy(img)

# When preprocessing observations in a non-vectorized environment, we need to add a batch dimension.
# This is the case for human-in-the-loop RL where there is only one environment.
if img.ndim == 3:
img = img.unsqueeze(0)
# sanity check that images are channel last
_, h, w, c = img.shape
assert c < h and c < w, f"expect channel last images, but instead got {img.shape=}"
Expand All @@ -62,13 +66,18 @@ def preprocess_observation(observations: dict[str, np.ndarray]) -> dict[str, Ten
return_observations[imgkey] = img

if "environment_state" in observations:
return_observations["observation.environment_state"] = torch.from_numpy(
observations["environment_state"]
).float()
env_state = torch.from_numpy(observations["environment_state"]).float()
if env_state.dim() == 1:
env_state = env_state.unsqueeze(0)

return_observations["observation.environment_state"] = env_state

# TODO(rcadene): enable pixels only baseline with `obs_type="pixels"` in environment by removing
# requirement for "agent_pos"
return_observations["observation.state"] = torch.from_numpy(observations["agent_pos"]).float()
agent_pos = torch.from_numpy(observations["agent_pos"]).float()
if agent_pos.dim() == 1:
agent_pos = agent_pos.unsqueeze(0)
return_observations["observation.state"] = agent_pos

return return_observations


Expand Down
Loading
Loading