-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Use HF_HUB_OFFLINE + fix has_file in offline mode
#31016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Overall I think it looks good - only thing is that I think we need a deprecation cycle for TRANSFORMERS_OFFLINE
Agree but if we want to deprecate it, I think we should do it in |
OK - sounds good to me!
I don't know, tbh, probably not before v5 as it's a common argument but if we think that's going to be too far away we could always move if forward |
I want to make a suggestion. In order to better follow the implementation in "huggingface_hub", instead of following the envVariable you better follow the boolean variable (with the same name) from withing huggingface_hub module: This will better follow the huggingface implementation when they decide to change something. |
|
Hi @marianpascalau, thanks for the suggestion! Yes, that's exactly what I'm doing here to delegate the env variable handling to |
|
Hi @Wauplin Is this PR also trying to fix the currently failing tests seen in other PRs too?
|
|
Thanks! I will check |
LysandreJik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Wauplin ! To ensure that the deprecation cycle is respected, could we keep a test with TRANSFORMERS_OFFLINE while ensuring that it throws a warning?
As @amyeroberts said, this was a very well advertised argument
|
Yes @LysandreJik @amyeroberts completly right. I reverted the tests to use |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this!
|
Looks like I broke some tests + have merge conflicts. Let me handle those and I'll ping you for final review. |
|
@amyeroberts Tests are passing now. Feel free to do a last review or merge when it's ready :) |
|
I would also like to see added to hub.py's
because how it is now seems like a bit of a security issue to me with the: and I noticed there were some bert models in 2022 trained on imdb (for potential token ablation research I'm theorizing?) and really hope this is not collecting training data on celebrity string names or other things... It only seems to filter Ps. Can someone also explain how the example strings docs and telemetry work? Very confused there on what the telemetry is sending if for instance I use stable_video_diffusion_pipeline.py and example strings doc I see is present in that file. |
|
Hi @311-code, concerns are legit but could you open a new issue on Github for it? This PR is related to offline mode only (and just so you know, when offline mode is enabled, telemetry is automatically stopped) |
|
Yes i won't be in front of PC until late tonight but I will highlight things I noticed. I am not quite sure if the redact code will work or have side effects yet. |
Should fix #30345.
Related to #31010 as well, even though it makes sense to merge both PRs IMO.
This PR:
huggingface_hubwithHF_HUB_OFFLINEenv variable instead ofTRANSFORMERS_OFFLINE. This change is still backward-compatible meaning that users that had already setTRANSFORMERS_OFFLINEwon't notice a difference. The goal is to have a single environment variable to disable network in all HF libraries.HF_HUB_OFFLINEenv variablehas_file=> it was not the case before, causing Problem with pretrained Transformers in Offline mode #30345 and Do not trigger autoconversion if local_files_only #31004local_files_onlyandcache_dirtohas_file()helper. At the moment, if connection/network is disabled, an error is raised. But if an authentication/authorization problem occurs, it returns False. IMO this is not consistent and we should aim at having the same response if server can't be requested. Instead of returningFalse, I suggest we default to checking the cache directory in case the information exists there.has_filein offline modePlease let me know what you think.
(Note: some tests are failing but I prefer to get theoretical approval before moving forward on this).