Skip to content

fix(dataset): use revision-safe Hub cache for downloaded datasets#3233

Merged
imstevenpmwork merged 6 commits intomainfrom
pr/3229
Mar 27, 2026
Merged

fix(dataset): use revision-safe Hub cache for downloaded datasets#3233
imstevenpmwork merged 6 commits intomainfrom
pr/3229

Conversation

@imstevenpmwork
Copy link
Copy Markdown
Collaborator

@imstevenpmwork imstevenpmwork commented Mar 27, 2026

Supersedes: #3229

Added:

  • move has_legacy_hub_download_metadata to utils
  • _requested_root added in the different creation methods for consistency (and use metadata filed as single source of truth)
  • makes _root from the reader public
  • tests _download path in LeRobotDataset in addition to the test for the metadata
  • metadata class is also revision safe for completness
  • add a comment for new constant for future clarity

CI Full run: https://github.com/huggingface/lerobot/actions/runs/23666039167

AdilZouitine and others added 4 commits March 26, 2026 20:54
…uce hub cache support

- Updated DatasetConfig and LeRobotDatasetMetadata to clarify root directory behavior and introduce a dedicated hub cache for downloads.
- Refactored LeRobotDataset and StreamingLeRobotDataset to utilize the new hub cache and improve directory management.
- Added tests to ensure correct behavior when using the hub cache and handling different revisions without a specified root directory.
- Updated LeRobotDataset to store the requested root path separately from the actual root path.
- Adjusted metadata loading to use the requested root, enhancing clarity and consistency in directory management.
@github-actions github-actions bot added dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing configuration Problems with configuration files or settings labels Mar 27, 2026
@imstevenpmwork imstevenpmwork changed the title Pr/3229 fix(dataset): use revision-safe Hub cache for downloaded datasets Mar 27, 2026
@imstevenpmwork imstevenpmwork self-assigned this Mar 27, 2026
@imstevenpmwork
Copy link
Copy Markdown
Collaborator Author

CI Failures related to: #3231 are omitted

@imstevenpmwork imstevenpmwork merged commit 4e45acc into main Mar 27, 2026
10 of 11 checks passed
@imstevenpmwork imstevenpmwork deleted the pr/3229 branch March 27, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Problems with configuration files or settings dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants