fix: update root/repo_id path in lerobot_dataset.py#1057
fix: update root/repo_id path in lerobot_dataset.py#1057tc-huang wants to merge 32 commits intohuggingface:mainfrom
root/repo_id path in lerobot_dataset.py#1057Conversation
The `--local-files-only` argument and its related instructions were removed from the following documentation files to reflect changes introduced in commit 3354d91 "LeRobotDataset v2.1 (huggingface#711)", which removed the `--local-files-only` argument from `lerobot/scripts/visualize_dataset.py` and `lerobot/scripts/visualize_dataset_html.py`. Updated files: 1. README.md 2. examples/7_get_started_with_real_robot.md 3. examples/10_use_so100.md 4. examples/11_use_lekiwi.md 5. examples/11_use_moss.md 6. examples/12_use_so101.md
Modified the `LeRobotDataset.__init__` method to append the `repo_id` to the specified `root` path. Before change: - If `root` was None => `HF_LEROBOT_HOME / repo_id` - If `root` was provided => `Path(root)` After change: - If `root` was None => `HF_LEROBOT_HOME / repo_id` - If `root` was provided => `Path(root) / repo_id` This change ensures that the local dataset stored in a subdirectory named after their repo_id, aligning with the LeRobotDataset docstring: "Locally, the dataset will be stored under root/repo_id."
**Dataset**: Modified `LeRobotDatasetMetadata.create` to append `repo_id` to `root` path. - Before: `root = HF_LEROBOT_HOME / repo_id` (if None) or `root = Path(root)` (if provided). - Now: `root = HF_LEROBOT_HOME / repo_id` (if None) or `root = Path(root) / repo_id` (if provided). **Tests**: Updated `test_record_and_replay_and_policy` in `tests/robots/test_control_robot.py` to remove `repo_id` from `root` path. - Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`). - Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`).
Updated test cases in `tests/robots/test_control_robot.py` to remove `repo_id` or `eval_repo_id` from the `root` path used for test data storage.
**Changes**:
1. `test_record_without_cameras`:
- Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
2. `test_record_and_replay_and_policy`:
- Before: `eval_root = tmp_path / "data" / eval_repo_id` (i.e., `tmp_path/data/lerobot/eval_debug`)
- Now: `eval_root = tmp_path / "data"` (i.e., `tmp_path/data`)
3. `test_resume_record`:
- Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
4. `test_record_with_event_rerecord_episode`:
- Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
5. `test_record_with_event_exit_early`:
- Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
6. `test_record_with_event_stop_recording`:
- Before: `root = tmp_path / "data" / repo_id` (i.e., `tmp_path/data/lerobot_test/debug`)
- Now: `root = tmp_path / "data"` (i.e., `tmp_path/data`)
--local-files-only from docs and append repo_id to root in LeRobotDataset.__init__--local-files-only from docs and update root/repo_id path in lerobot_dataset.py
- Updated `LeRobotDatasetMetadata.__init__` to append `repo_id` to `root`:
- Old: `self.root = HF_LEROBOT_HOME / repo_id` (if `root` is None) or
`self.root = Path(root)` (if `root` is provided).
- New: `root = HF_LEROBOT_HOME / repo_id` (if `root` is None) or
`root = Path(root) / repo_id` (if `root` is provided).
- Fixed `LeRobotDatasetMetadata` init call in `LeRobotDataset.__init__`:
- Old: `self.meta = LeRobotDatasetMetadata(...self.root,...)`.
- New: `self.meta = LeRobotDatasetMetadata(...root,...)`.
- Prevents duplicate `repo_id` appending in dataset path.
for more information, see https://pre-commit.ci
e81d2dc to
0e54b2c
Compare
--local-files-only from docs and update root/repo_id path in lerobot_dataset.py--local-files-only from the README and update root/repo_id path in lerobot_dataset.py
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull Request Overview
This PR addresses the deprecation of the --local-files-only argument and improves local dataset path handling for LeRobot datasets. The changes ensure consistency between documentation and code functionality while fixing path resolution issues.
- Removes deprecated
--local-files-onlydocumentation from README - Updates path handling to properly resolve local datasets at
root/repo_idstructure - Fixes parameter passing between LeRobotDataset and LeRobotDatasetMetadata classes
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| README.md | Removes deprecated --local-files-only flag from visualization example |
| src/lerobot/datasets/lerobot_dataset.py | Updates path construction and metadata initialization for proper local dataset handling |
| self.repo_id = repo_id | ||
| self.revision = revision if revision else CODEBASE_VERSION | ||
| self.root = Path(root) if root is not None else HF_LEROBOT_HOME / repo_id | ||
| self.root = Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_id |
There was a problem hiding this comment.
When root is a string, Path(root) / repo_id will fail if repo_id contains a forward slash (e.g., 'lerobot/pusht'). The path concatenation should handle this case properly by using Path(root) / Path(repo_id).name or similar approach to avoid creating invalid nested directory structures.
| self.root = Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_id | |
| self.root = Path(root) / Path(repo_id).name if root is not None else HF_LEROBOT_HOME / Path(repo_id).name |
| obj = cls.__new__(cls) | ||
| obj.repo_id = repo_id | ||
| obj.root = Path(root) if root is not None else HF_LEROBOT_HOME / repo_id | ||
| obj.root = Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_id |
There was a problem hiding this comment.
Same issue as line 89: when repo_id contains a forward slash (e.g., 'lerobot/pusht'), Path(root) / repo_id will create nested directories instead of treating the repo_id as a single directory name. This inconsistency could cause datasets to be stored/looked up in incorrect locations.
| obj.root = Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_id | |
| sanitized_repo_id = repo_id.replace("/", "_") | |
| obj.root = Path(root) / sanitized_repo_id if root is not None else HF_LEROBOT_HOME / sanitized_repo_id |
| self.repo_id = repo_id | ||
| self.root = Path(root) if root else HF_LEROBOT_HOME / repo_id | ||
| self.root = Path(root) / repo_id if root else HF_LEROBOT_HOME / repo_id |
There was a problem hiding this comment.
Same path concatenation issue: Path(root) / repo_id will create nested directories when repo_id contains forward slashes. This creates inconsistency with how paths are expected to be structured based on the PR description which mentions datasets should be at ./my_local_data_dir/lerobot/pusht.
|
Hi @aliberts, I saw you recently worked on a PR to update the README. The outdated argument I thought you might be interested in taking a look. Thanks! |
|
Hi @tc-huang, My apologies for the delay on this PR. I think the If you still want to contribute, feel free to update your PR accordingly and to ping me - I'll try to merge it asap ! Best, Caroline. |
|
Hi @CarolinePascal, thank you for reviewing this PR and for your helpful comment! I took some time to re-check the issues mentioned here against the latest version of the code. It seems that the From my understanding, based on the explanation in this discussion and assuming this convention still applies in v3, I was able to download and load a dataset with from lerobot.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset(
repo_id="lerobot/pusht",
root="/workspace/dataset_dir_path/lerobot/pusht"
)The data is organized as shown below: However, I noticed that above convention of using python src/lerobot/scripts/lerobot_edit_dataset.py \
--repo_id lerobot/pusht \
--root /workspace/dataset_dir_path/lerobot/pusht \
--operation.type delete_episodes \
--operation.episode_indices "[0, 2, 5]"This results in an error similar to issue #2316 (changing to In These sections seem to handle - Path(root) if root is not None else HF_LEROBOT_HOME / repo_id
+ Path(root) / repo_id if root is not None else HF_LEROBOT_HOME / repo_idGiven these observations, I am currently unsure which convention to follow:
I would be glad to continue working on this and update the PR accordingly if the issue described above makes sense, or focus on updating the docstrings as you suggested in your previous comment. If it is not aligned with the intended design, I will close this PR. Best, tc-huang |
--local-files-only from the README and update root/repo_id path in lerobot_dataset.pyroot/repo_id path in lerobot_dataset.py
|
Hi @tc-huang, After digging a bit more in the codebase, it seems we have to comply with the decisions that have been made for Thanks again for your in-depth investigation on the topic ! Best, Caroline. |
What This PR Does
Update
rootpath for dataset and metadata store locallyUpdated the
__init__method of theLeRobotDatasetandLeRobotDatasetMetadataclass to support local datasets stored underroot/repo_id:__init__method did not correctly handle local dataset paths with theroot/repo_idstructure, causinglerobot/scripts/visualize_dataset.pyandlerobot/scripts/visualize_dataset_html.pyto fail to load datasets located atroot/repo_id.Update
LeRobotDatasetMetadatainit call inLeRobotDatasetclass, changingself.roottorootto avoid duplicaterepo_idappending.Updated the
createmethod of theLeRobotDatasetMetadataclass to support local metadata stored underroot/repo_id:Updated Files for Change 2:
lerobot/common/datasets/lerobot_dataset.pyHow It Was Tested
Preparing the
pushtDataset Locallycd lerobotrepository directory.pushtdataset using the same instructions inREADME.mdto verify that visualization from the Hugging Face Hub can work:python -m lerobot.scripts.visualize_dataset \ --repo-id lerobot/pusht \ --episode-index 0~/.cache/huggingface/lerobot/lerobot/pushtby default.my_local_data_dir) for testing:mkdir my_local_data_dir mv ~/.cache/huggingface/lerobot/lerobot my_local_data_dirTest 1: Visualize Local Dataset with
visualize_dataset.pymy_local_data_dirusing:python -m lerobot.scripts.visualize_dataset \ --repo-id lerobot/pusht \ --root my_local_data_dir \ --episode-index 0Test 2: Visualize Local Dataset with
visualize_dataset_html.pymy_local_data_dirusing:python -m lerobot.scripts.visualize_dataset_html \ --repo-id lerobot/pusht \ --root my_local_data_dir \ --episodes 0Test 3: Train with Local Dataset
pushtdependencies:python -m lerobot.scripts.train \ --dataset.repo_id=lerobot/pusht \ --dataset.root=my_local_data_dir \ --policy.type=act \ --policy.device=cuda \ --policy.push_to_hub=false \ --env.type=pusht \ --output_dir=outputs/train/act_pusht_test \ --job_name=act_pusht_test \ --wandb.enable=false \ --steps=1Test 4: Create
LeRobotDatasetMetadataandLeRobotDatasetinstances using simple Python codeTest 5: Functional Testing with Physical Robot and Local Dataset
my_local_data_dir/lerobot_user/koch_test(e.g., using thekocharm).--robot.portand--teleop.idarguments should be replaced with your actual USB port values.python -m lerobot.record \ --robot.type=koch_follower \ --robot.port=/dev/ttyUSB0 \ --robot.id=my_awesome_follower_arm \ --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \ --teleop.type=koch_leader \ --teleop.port=/dev/ttyACM0 \ --teleop.id=my_awesome_leader_arm \ --display_data=true \ --dataset.repo_id=lerobot_user/koch_test \ --dataset.root=my_local_data_dir \ --dataset.num_episodes=1 \ --dataset.episode_time_s=10 \ --dataset.reset_time_s=10 \ --dataset.single_task="Recording test" \ --dataset.push_to_hub=Falsemy_local_data_dir/lerobot_user/koch_test(e.g., using thekocharm).--robot.portargument should be replaced with your actual USB port value.python -m lerobot.replay \ --robot.type=koch_follower \ --robot.port=/dev/ttyUSB0 \ --dataset.repo_id=lerobot_user/koch_test \ --dataset.root=my_local_data_dir \ --dataset.episode=0my_local_data_dir/lerobot_user/koch_testusing the Rerun UI.python -m lerobot.scripts.visualize_dataset \ --repo-id lerobot_user/koch_test \ --root my_local_data_dir \ --episode-index 0my_local_data_dir/lerobot_user/koch_testusing the HTML UI.python -m lerobot.scripts.visualize_dataset_html \ --repo-id lerobot_user/koch_test \ --root my_local_data_dir \ --episodes 0my_local_data_dir/lerobot_user/koch_test.python -m lerobot.scripts.train \ --dataset.repo_id=lerobot_user/koch_test \ --dataset.root=my_local_data_dir \ --policy.type=act \ --policy.device=cuda \ --policy.push_to_hub=false \ --output_dir=outputs/train/act_koch_test \ --job_name=act_koch_test \ --wandb.enable=false \ --steps=1How to Review & Test
README.md.--robot.type=koch_followerandteleop.type=koch_leaderif you use a robot arm other thankoch.)Related Issues & PRs
This PR addresses issues #963 and #1541, and is related to PR #1542 and #1543.