-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI interface for downloading files #1105
Comments
Hi again @singingwolfboy and thanks for the proposition 🙂 The question on our side is more to know how much we want to prioritize All in all, I'll keep you updated if we move forward on this. |
Totally understandable -- and I'm really glad that I triggered a discussion about this! I completely understand balancing functionality with maintenance cost. This is one reason that I started out by opening a GitHub issue, rather than writing some code and sending a pull request! I'll give you and your team some time to figure out how you want to proceed on this. I'm more than happy to help out if you need, but I'm aware that managing open source contributors also takes time (which could be used on other things instead, like developing other features). In general, I'm very sympathetic to your position here, having worked at other companies that build open source software! However, if you're not able to build this feature, at a certain point I (or maybe others as well!) will try to build it on my own. I don't know if you see that as a good thing (ecosystem!) or a bad thing (competition?), but it comes from the excitement inherent in the fast-moving machine learning world of today, and especially in the digital art that Stable Diffusion and others have unlocked. I hope that I can work with you all to make it happen collaboratively! |
Thanks for publishing to |
@singingwolfboy Thanks for your answer and the enthusiasm around HF and Regarding the initial topic of the issue, we are not planning to add too much features to To give more context, on the Python side we already have quite some different methods to perform a single action (example: This doesn't mean we will never do it or that we don't want contributors (on the contrary !!) but for now that is out of scope of |
Just my /2c, one of the first things I expected the CLI to be able to do after
Both of these felt pretty overkill. I assume there may also be a way that I could just I haven't looked too deeply into the code here, but it was said that adding more code to the CLI would make it more complex/harder to maintain/etc, which I understand being a concern. But given that there is already the |
Ok, just played around with this a bit more, and looks like we can add the auth to
The code equivalent of this would be: from huggingface_hub import hf_hub_url, hf_hub_download
# Generate/show the URL
hf_hub_url(
repo_id="runwayml/stable-diffusion-inpainting",
filename="sd-v1-5-inpainting.ckpt",
)
# Download the file
hf_hub_download(
repo_id="runwayml/stable-diffusion-inpainting",
filename="sd-v1-5-inpainting.ckpt",
) To add code to support this use case to the CLI wouldn't end up being much longer than that surely.. just needs to take in the param for repo_id and filename and pass it through to the download function. |
Hi @0xdevalias thanks for jumping in the conversation. We might reconsider it in the future but for now the CLI is not so much our priority. Could you please let me know what would be your use case for downloading a file but not using Python code in your environment ? In the meantime, here is a 1-liner that does what you expect from a ➜ python -c "import huggingface_hub; print(huggingface_hub.hf_hub_download('gpt2', filename='config.json'))"
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/75e09b43581151bd1d9ef6700faa605df408979f/config.json For the record (and future ourselves), one could expect to also be able to download a snapshot of a repo when using |
if anyone wants to second this feature request please chime in here! |
Hi everyone, I think it's time to re-open this issue. 8 months ago was not the good time but since then the library gained more maturity and we should now be able to offer and maintain a 1. Definition>>> huggingface-cli download --help
usage: huggingface-cli download REPO_ID [PATH] [--help] [--repo-type REPO_TYPE] [--revision REVISION] [--token TOKEN] [--allow-patterns ALLEW_PATTERNS] [--ignore-patterns IGNORE_PATTERNS] [--to-local-dir] [--local-dir-use-symlinks] 2. Download file from the Hub# from a model
>>> huggingface-cli download gpt2 config.json
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
# from a dataset
>>> huggingface-cli download Open-Orca/OpenOrca 1M-GPT4-Augmented.parquet --repo-type=dataset equivalent to >>> from huggingface_hub import hf_hub_download
>>> hf_hub_download("gpt2", "config.json")
'/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json' 3. Download entire repo>>> huggingface-cli download Open-Orca/OpenOrca --repo-type=dataset
/home/wauplin/.cache/huggingface/hub/datasets--OpenOrca--OpenOrce/snapshots/984517afe11f50298f278a704980280950aedb10 equivalent to >>> from huggingface_hub import snapshot_download
>>> snapshot_download("Open-Orca/OpenOrca", repo_type="dataset")
'/home/wauplin/.cache/huggingface/hub/datasets--OpenOrca--OpenOrce/snapshots/984517afe11f50298f278a704980280950aedb10' 4. Download repo with filterse.g. download only safetensors weights >>> huggingface-cli download bigscience/bloom --allow-patterns=*.safetensors
/home/wauplin/.cache/huggingface/hub/models--bigscience--bloom/snapshots/984517afe11f50298f278a704980280950aedb10 equivalent to >>> from huggingface_hub import snapshot_download
>>> snapshot_download("bigscience/bloom", allow_patterns="*.safetensors") Could also use 5. Download from revision# Entire Space from PR
>>> huggingface-cli download fffiloni/zeroscope --repo-type=space --revision=refs/pr/78
# Single file from specific commit oid
>>> huggingface-cli download Salesforce/xgen-7b-8k-base generation_config.json --revision=3987e094377fae577bba039af1b300ee8086f9e1 6. Download to local folderIt is now possible to download files to a local folder instead of the cache folder (see explanations and limitations). IMO this should also be integrated in the CLI: >>> pwd
/home/wauplin/projects/bloom_weights
>>> huggingface-cli download bigscience/bloom --allow-patterns=*.safetensors --to-local-dir
/home/wauplin/projects/bloom_weights
>>> huggingface-cli download bigscience/bloom config.json --to-local-dir
/home/wauplin/projects/bloom_weights/config.json EDIT: The plan is to use HTTP-only methods. Downloading via this CLI will never be meant to create a local git clone. 7. Download private model>>> huggingface-cli download Wauplin/private-model --token=hf_*** Token can be passed in CLI command. Let's encourage Return valueI like the idea from @singingwolfboy (#1105 (comment)) to catch main errors like private repos, gated repos, missing repos, missing files and print meaningful messages to the user. This can be done directly or in a future PR. When the download succeed, I'd be in favor of returning the raw path instead of a message. This will make it easier to integrate in a shell script. This is a first comment to start over the discussion on this feature request. @singingwolfboy @0xdevalias Feel free to comment and suggest modifications. IMO such a CLI will mainly be a wrapper on top of |
I also created an issue to add a |
And... Thanks @martinbrose for working on it (#1617) 🤗 Some examples: >>> huggingface-cli download gpt2 config.json
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
>>> huggingface-cli download bigcode/the-stack --repo-type=dataset --revision=v1.2 --include="data/python/*" --exclu
de="*.json" --exclude="*.zip"
Fetching 206 files: 100%|████████████████████████████████████████████| 206/206 [02:31<2:31, ?it/s]
/home/wauplin/.cache/huggingface/hub/datasets--bigcode--the-stack/snapshots/9ca8fa6acdbc8ce920a0cb58adcdafc495818ae7 For more details, check out the guide. A new release ( |
Thanks @Wauplin, that's great news! |
Getting this error today for downloading a repo. It was working last time that I used it:
|
@absalan Yes it should work. I believe your
and retry. If it still happens, can you run |
Thanks @Wauplin, that solved the issue. Appreciate your quick response. |
Hi @Wauplin I tried |
@mhdpr it looks like you have a mismatch in your install. The output of (in your case, |
Glad to know your problem's solved @absalan! |
Is your feature request related to a problem? Please describe.
Stable Diffusion is becoming very popular, and many developers are interested in trying it out. However, there are still a lot of complex manual steps in the installation process, and the biggest hurdle is downloading the weights from Huggingface Hub.
Describe the solution you'd like
The
huggingface-cli
command should be extended to allow users to download files from Huggingface Hub to their computer. The default download location should be the cache, but we may want to allow users to download to arbitrary locations on their computer as well. Here's what I'm imagining:When a user runs that command, there are a few possible outcomes. If the user is not logged in, they would receive a friendly error message, something like:
If the user is logged in, but has not yet accepted the the terms of the license, they would receive a different friendly error message, something like:
If the user is logged in and has accepted the terms of the license, they would instead see a progress bar as the file is downloaded to their computer. If the user executes this command and the file is already downloaded to the cache, they see a friendly informative message, telling them where to find the file on their computer:
Describe alternatives you've considered
It seems possible to write this script separately using the
hf_hub_download
function, but why not make it part of the existing CLI?The text was updated successfully, but these errors were encountered: