Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HfFileSystem.exists() for deleted repos and update documentation #2643

Merged
merged 4 commits into from
Nov 4, 2024

Conversation

hanouticelina
Copy link
Contributor

Fixes #2552.

This PR fixes a bug where HfFileSystem.exists() returns True even after the repository has been deleted. The issue arises because the cache is not properly invalidated when the repository is deleted, causing HfFileSystem to rely on stale cache data. The fix updates the HfFileSystem.invalidate_cache() method to clear the repository 'existence' cache when the provided path matches the repository root.

This PR also includes an update to HfFileSystem documentation.

Main changes

  • Update HfFileSystem.invalidate_cache() to clear repository cache.
  • Update HfFileSystem documentation to suggest using HFApi methods when possible.
  • Add docstrings to HfFileSystem's methods.
  • Make HfApi class docstring visible in package reference by moving it from __init__ to class level.

@hanouticelina hanouticelina requested a review from Wauplin October 29, 2024 21:14
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hanouticelina!

Left a few nits. Could you also add a test to check the initial snippet now works as expected?

from huggingface_hub import HfApi, HfFileSystem
import time

api = HfApi()
hfs = HfFileSystem()
hub_model_id = "MoritzLaurer/test-repo"

api.create_repo(hub_model_id, repo_type='model')
time.sleep(5)
api.delete_repo(repo_id=hub_model_id, repo_type="model")
time.sleep(5)
assert not hfs.exists(hub_model_id, refresh=True)

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_file_system.py Outdated Show resolved Hide resolved
Comment on lines -887 to -889
# Add docstrings to the methods of HfFileSystem from fsspec.AbstractFileSystem
for name, function in inspect.getmembers(HfFileSystem, predicate=inspect.isfunction):
parent = getattr(fsspec.AbstractFileSystem, name, None)
Copy link
Contributor

@Wauplin Wauplin Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think a similar logic makes sense for all "hidden" methods not defined in huggingface_hub but in fsspec instead: read_text, read-bytes, write_text, write_bytes, tail, touch, etc. The initial goal was to make them appear in the documentation (see #2100) even though the implementation did not do that.

This can be done in a later PR (not urgent)

@hanouticelina
Copy link
Contributor Author

thanks @Wauplin for the review! I added the test in b937912. I will keep your suggestion here for a separate PR.

@hanouticelina hanouticelina requested a review from Wauplin October 31, 2024 16:43
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! Thanks for adding the test :)

@hanouticelina hanouticelina merged commit 1da4018 into main Nov 4, 2024
17 checks passed
@hanouticelina hanouticelina deleted the fix-fs-exists branch November 4, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fs.exists(hub_model_id, refresh=True) does not work correctly after deleting a repository
3 participants