Skip to content

fix(dataset-tools): unable to merge local datasets in cache due to a …#2369

Closed
suessmann wants to merge 1 commit intohuggingface:mainfrom
suessmann:fix-dataset-merging
Closed

fix(dataset-tools): unable to merge local datasets in cache due to a …#2369
suessmann wants to merge 1 commit intohuggingface:mainfrom
suessmann:fix-dataset-merging

Conversation

@suessmann
Copy link

What this does

When using local root, the lerobot-edit-dataset with --operation.type=merge returned the FileNotFoundError. This was due to the fact that when specifying root to load a dataset in the generator [LeRobotDataset(repo_id, root=Path(cfg.root) for repo_id in cfg.operation.repo_ids], the root to the exact dataset was not actually passed.

This PR fixes that by adding [...] root=Path(cfg.root) / repo_id) for repo_id in [...].

How it was tested

Just run the merge and it worked, unlike before.

How to checkout & try? (for the reviewer)

Try merging with local root, such as

lerobot-edit-dataset \
    --repo_id ${HF_USER}/merged_dataset \
    --operation.type merge \
    --operation.repo_ids "['user/local_dataset_1', 'user/local_dataset_1']" \
    --root `root/to/local/cache`

@suessmann suessmann marked this pull request as draft November 3, 2025 13:22
@suessmann suessmann marked this pull request as ready for review November 3, 2025 13:24
@s1lent4gnt
Copy link
Member

Hey @suessmann, thanks for the fix! One thing — Path(cfg.root) will raise TypeError when cfg.root is None. Small fix:

datasets = [
    LeRobotDataset(repo_id, root=Path(cfg.root) / repo_id if cfg.root else None)
    for repo_id in cfg.operation.repo_ids
]

@s1lent4gnt
Copy link
Member

Superseded by #2369, Thanks @suessmann for the bug report and the fix!

@s1lent4gnt s1lent4gnt closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants