-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developer mode requirement on Windows #1062
Comments
Thanks for making this issue. Responding to Lysandre's comment here huggingface/transformers#19048 (comment): No, to use WSL you do not need to enable developer mode since 2017. I have been running WSL2 with CUDA and never had Developer Mode activated. I also only ever use WSL2 if I really cannot get software to run natively on Windows. WSL is great but it has its pitfalls/quirks. Like the interaction with PyCharm of setting up/managing different environments, but also IO which is really slow between So for most of my transformers projects, I've used Windows, mostly without issue or I posted a PR to fix the problem. Windows users seem plenty. And in my case, I am especially worried for the class room. At the start, I use Colab because it's easy for beginners. But after a while we switch to a local installation and running scripts instead of notebooks so they at least know about those things. I cannot require them to enable developer mode (liability) and WSL will take too much time of a class to get working for everyone. Outside of the classroom, you can think about anyone who are just getting started with |
For our frontend and many others as well this will be a big issue, it is very hard to explain to users as to why they need developer mode or administrative permissions in order to be able to download a model. This seems very suspicious to a user since they will not be aware of the technical limitation. It would be much nicer if we have control over this, such as being able to use the old cache system if desired. |
Posting here to keep an eye on this. Fully agree with the sentiment. Requiring developer mode will limit the ability to use transformers in programs that are given to non-developers. An average user will not want to enable developer mode when it's full of warnings. |
I'm curious if this affects many many Windows users, or not. Let's see if more users post about this in the coming days (to see how we could fix it)? Will also ask on Twitter=) |
As @henk717, @ebolam and my posts suggests, the problem is not just with developers. If they really need to and you do not give them a choice, I guess they will enable developer mode or be forced to switch to WSL (but again: I was not eager to enable dev mode on my machine either; it comes with its own risks). A major problem is the consequence for non-developers: hobbyists, students, end-users who try to run products/demos/tools that use As @henk717 suggests, keeping the option to use the old cache mechanism would be great. |
Yes you're right @BramVanroy |
Adding my comment here (as suggested by @julien-c). Activating developer mode is asking for trouble. On top of that, I had to drop support for WSL because my developer PC simply cannot handle it (running it will take 75% of available RAM + impossible to shut down once activated, not to mention that I cannot even run dropbox on my windows machine without constant hanging). Turning on WSL and developer mode not only requires users to run a beefy pc, but it also pushes them to security warnings and dialogues that I Pavlov'ed normal users into to pressing "no" for security reasons. |
Following this closely as I faced the same issue I posted in the HF forum here: https://discuss.huggingface.co/t/symlink-error-when-importing-ortseqclass-model-via-pipeline/20706 I am not a developer and was not too comfortable with running PC in dev mode, tried experimenting to see if that would solve it but got the same error after switching dev mode on and doing a full restart. |
Very much this, transformers is a dependency of multiple user friendly or end user experiences. KoboldAI in our case, but on steam there are games like AI Roguelite and AIDventure that also use transformers as a dependency. Explaining a user who buys an end user product in a store front that they manually need to change Windows settings for the program to work correctly is a step to far and should not be automated away since you are then changing security settings. So a solution that does not require system wide changes is very much desired for applications that are either portable, or installed trough automated means. |
Another thing: It's windows now, but what if you would force the user to use root to install the package under Linux? Would you force people to use "sudo" to run transformers, knowing that it uses pytorch which has an exploit in their system that you are currently scanning for? |
Hey, thanks all for your comments. We agree that it shouldn't behave this way under Windows and we're working on a solution. |
Hi everyone and thank you for the feedback. Here is a PR to implement a cache-system variant that do not need symlinks if there are not available: #1067. See description for more details. Any feedback is very welcomed ! :) |
#1067 is now merged and will be available in next release of |
@BramVanroy and others in this thread, would you mind please updating your install to use the updated You can update with the following:
Using any Thank you! |
The implementation works within our software, I did notice a performance regression compared to the old cache however since it takes time (and space) to copy the files to the snapshot folder. Moving the files, or referencing the blob location trough text files (effectively fake symlinks) would be a leaner solution that would impact users with slow storage less. |
Hi @nickmuchi87 , thanks for reporting this bug. I created a PR (#1077) to make the
|
@henk717 We understand your concern. To be fair, we hesitated between two workarounds when symlinks are not supported. Both solutions (fake symlinks or duplicated files) have their own pros and cons. Please let me share the elements on which we based the decision to go for duplicated files: Fake symlinksWith fake symlinks, a text file containing only a path to the actual blob file is stored under the Pros:
Cons:
Duplicate filesIn this approach, we do not use the Pros:
Cons:
ConclusionAs you may have understood, we really favored an approach that will avoid as much as possible the friction to integrate Please also notice that we provide a I hope this message helps everyone here (and in the future) understand the workaround we chose to implement and the reasons behind it. Please let me know if you have any further questions. (Note: for any questions on how the cache-system is working, you can refer to this documentation page) |
I tried installing the above: then restarted my jupyter-lab but getting the same error, not sure what I am doing wrong here: |
@Wauplin Thanks for the elaborate write up! While I do not have time now to investigate the details (and noting that @nickmuchi87 is already testing your changes), I am extremely satisfied about the quick PR to make transformers tools fully/openly accessible to Windows users again. While I can understand @henk717's comment about disk usage, I am thankful for your elaborate motivation. The decision seems sensible to me. One question. You write the following:
Does this mean that specifically only revisions of the same model will be duplicated? Or does this also affect regular models? While I do not use revisions of the same model often, I do switch between a lot of different models often. Does that mean that they will all be duplicated? (This is more of a hypothetical questions because I ended up enabling developer mode to continue my current work with transformers.) |
@nickmuchi87 : I suspect cache_dir = os.path.dirname(os.path.commonpath([src, dst]))
if are_symlinks_supported(cache_dir=cache_dir): # <- this line changed in the new branch !
os.symlink(relative_src, dst)
elif new_blob:
os.replace(src, dst)
else:
shutil.copyfile(src, dst) Could you try uninstalling/reinstalling in your env ? Thanks in advance and sorry for the inconvenience.
|
@BramVanroy Thank you for your feedback. Here are some more detailed explanations:
What I mean by "revisions of the same model" is if you download multiple versions of a single repo. (for the record, I mix a lot "repo" and "model" in the wording. In
I not sure what you mean by "regular models". If you switch between different models (e.g. gpt2, distilgpt2, gpt2-large, xlnet-base-cased,...), each model will be individually downloaded in your cache-system as any user. Blob files are never shared between repositories in the cache, no matter the platform. Once they have been downloaded once, you can switch between them as many times as you want without redownloading anything. Hope this makes it clearer for you now :) |
@Wauplin that worked, so I uninstalled and re-installed using Anaconda Prompt this time instead of Jupyter, thanks for that! Appreciate the efforts! |
@nickmuchi87 Perfect ! 🎉 |
@Wauplin If I understand correctly, this means that indeed if I download many different repos I won't get duplicated data and that this only happens if you download different commits/revisions of the same model repo. So that's a non-issue for me. Thanks again for your work! |
@BramVanroy exactly :) |
The current
snapshot_download
andhf_hub_download
methods currently use symlinks for efficient storage management. However, symlinks are not properly supported on Windows where administrator privileges or Developer Mode needs to be enabled in order to be used.We chose to take this approach so that it mirrors the linux/osx behavior.
Opening an issue here to track issues encountered by users in the ecosystem:
The text was updated successfully, but these errors were encountered: