Fix: lerobot-dataset-edit merge with custom root paths locally#2739
Open
riochuong wants to merge 3 commits intohuggingface:mainfrom
Open
Fix: lerobot-dataset-edit merge with custom root paths locally#2739riochuong wants to merge 3 commits intohuggingface:mainfrom
riochuong wants to merge 3 commits intohuggingface:mainfrom
Conversation
sotanakamura
reviewed
Jan 8, 2026
| # by appending the repo_id to the root. When root is None, LeRobotDataset | ||
| # automatically uses HF_LEROBOT_HOME / repo_id. | ||
| datasets = [ | ||
| LeRobotDataset(repo_id, root=Path(cfg.root) / repo_id if cfg.root else None) |
Contributor
There was a problem hiding this comment.
As defined in LeRobotDataset, datasets will be stored under root/repo_id, so we need to standardize the dataset location in this way.
related to #2316
Author
There was a problem hiding this comment.
does that mean for merge to work correctly on custom local folder the only way is to use root=None and move data to default HF_LEROBOT_HOME (not too bad but need to remember to set this as venv) ??
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
fix(scripts): handle custom root paths in dataset merge operation
CREDIT:
Fix and tests are written with assistant from Claude 4.5 Sonnet. I did verify the changes and executing testing locally to make sure it works as expected
Type / Scope
scripts/lerobot_edit_dataset,datasets/dataset_toolsSummary / Motivation
The
lerobot-dataset-edit --operation.type mergecommand failed when users specified custom--rootpaths for merging local datasets. Thehandle_merge()function was passing the root directory directly toLeRobotDataset()without appending the individual dataset'srepo_id, causing the loader to look for metadata files in the wrong location (e.g.,/path/to/datasets/meta/info.jsoninstead of/path/to/datasets/dataset1/meta/info.json).This fix enables users to merge locally stored datasets that are organized in custom directories, which is a common workflow when working with self-collected robotic datasets before uploading to HuggingFace Hub.
Related issues
What changed
Code changes:
src/lerobot/scripts/lerobot_edit_dataset.py(lines 248-250): Updatedhandle_merge()to construct full dataset paths by appendingrepo_idto customrootwhen provided:Before (buggy):
datasets = [LeRobotDataset(repo_id, root=cfg.root) for repo_id in cfg.operation.repo_ids]
After (fixed):
datasets = [
LeRobotDataset(repo_id, root=Path(cfg.root) / repo_id if cfg.root else None)
for repo_id in cfg.operation.repo_ids
]
Test additions:
tests/datasets/test_dataset_tools.py: Added 3 comprehensive test functions:test_handle_merge_with_custom_root()- Validates the bug fix with custom root pathstest_handle_merge_without_custom_root()- Ensures default behavior still workstest_handle_merge_custom_root_preserves_metadata()- Verifies metadata preservation during mergeBreaking changes: None. This is a pure bug fix that maintains backward compatibility.
How was this tested
Tests added:
test_handle_merge_with_custom_root- Creates two datasets in a custom root directory, merges them, and verifies the merged dataset is created in the correct location with proper episode counts.test_handle_merge_without_custom_root- Tests that the default behavior (no custom root) continues to work correctly.test_handle_merge_custom_root_preserves_metadata- Ensures that dataset metadata (FPS, features, episode counts, frame counts) are correctly preserved after merging with custom roots.Manual testing:
lerobot-dataset-edit
--repo_id merged_dataset
--root /path/to/datasets
--operation.type merge
--operation.repo_ids "['dataset1', 'dataset2']"
Test results:
$ pytest tests/datasets/test_dataset_tools.py -k merge -v
All 7 merge tests pass (3 new + 4 existing)## How to run locally (reviewer)
Run all merge-related tests:
pytest tests/datasets/test_dataset_tools.py -k merge -vRun only the new tests:
pytest tests/datasets/test_dataset_tools.py::test_handle_merge_with_custom_root -v
pytest tests/datasets/test_dataset_tools.py::test_handle_merge_without_custom_root -v
pytest tests/datasets/test_dataset_tools.py::test_handle_merge_custom_root_preserves_metadata -vManual test with local datasets:
Create two test datasets in a custom directory
lerobot-record --some-config # or use existing datasets
Try merging with custom root (this would have failed before the fix)
lerobot-dataset-edit
--repo_id test_merged
--root /path/to/your/datasets
--operation.type merge
--operation.repo_ids "['dataset1', 'dataset2']"## Checklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
Focus areas:
lerobot_edit_dataset.py: Verify the path construction logic correctly handles bothNoneand custom root casesDesign note:
The fix follows the pattern already established by
LeRobotDataset.__init__()which accepts either:root=None→ usesHF_LEROBOT_HOME / repo_idroot=Path(...)→ uses the provided path as-isBy constructing
Path(cfg.root) / repo_idbefore passing toLeRobotDataset(), we ensure the dataset loader receives the complete path to each individual dataset directory.Edge cases covered: