Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI interface for downloading files #1105

Closed
singingwolfboy opened this issue Oct 10, 2022 · 19 comments
Closed

CLI interface for downloading files #1105

singingwolfboy opened this issue Oct 10, 2022 · 19 comments
Labels
enhancement New feature or request

Comments

@singingwolfboy
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Stable Diffusion is becoming very popular, and many developers are interested in trying it out. However, there are still a lot of complex manual steps in the installation process, and the biggest hurdle is downloading the weights from Huggingface Hub.

Describe the solution you'd like
The huggingface-cli command should be extended to allow users to download files from Huggingface Hub to their computer. The default download location should be the cache, but we may want to allow users to download to arbitrary locations on their computer as well. Here's what I'm imagining:

huggingface-cli download CompVis/stable-diffusion-v-1-4-original sd-v1-4-full-ema.ckpt

When a user runs that command, there are a few possible outcomes. If the user is not logged in, they would receive a friendly error message, something like:

In order to download from the CompVis/stable-diffusion-v-1-4-original repo,
you must agree to the terms of the license, which requires a Huggingface Hub
account. Please register an account on https://huggingface.co, and then run
`huggingface-cli login` to sign in to your account.

If the user is logged in, but has not yet accepted the the terms of the license, they would receive a different friendly error message, something like:

In order to download from the CompVis/stable-diffusion-v-1-4-original repo,
you must agree to the terms of the license. Please visit
https://huggingface.co/CompVis/stable-diffusion-v1-4 to read the license
and agree to the terms.

If the user is logged in and has accepted the terms of the license, they would instead see a progress bar as the file is downloaded to their computer. If the user executes this command and the file is already downloaded to the cache, they see a friendly informative message, telling them where to find the file on their computer:

The file you requested is already downloaded to your computer:
/Users/example/.cache/huggingface/CompVis/stable-diffusion-v-1-4-original/sd-v1-4-full-ema.ckpt

Describe alternatives you've considered
It seems possible to write this script separately using the hf_hub_download function, but why not make it part of the existing CLI?

@Wauplin
Copy link
Contributor

Wauplin commented Oct 10, 2022

Hi again @singingwolfboy and thanks for the proposition 🙂
In general the focus of huggingface_hub has been on the python features more than the CLI itself (and that's why it is so tiny at the moment). The CLI interface you are proposing would definitely be a wrapper around hf_hub_download as you mentioned.

The question on our side is more to know how much we want to prioritize huggingface-cli over other work we currently have on huggingface_hub. To be transparent, your proposition triggered a conversation on our side (cf internal thread - sorry for the internal link). At the moment, the CLI is quite tiny with commands to login/logout/whoami, scan/delete the cache, create a repo and some LFS-specific stuff. The more commands we add, the more maintenance is required.

All in all, I'll keep you updated if we move forward on this.

@singingwolfboy
Copy link
Contributor Author

Totally understandable -- and I'm really glad that I triggered a discussion about this! I completely understand balancing functionality with maintenance cost. This is one reason that I started out by opening a GitHub issue, rather than writing some code and sending a pull request!

I'll give you and your team some time to figure out how you want to proceed on this. I'm more than happy to help out if you need, but I'm aware that managing open source contributors also takes time (which could be used on other things instead, like developing other features). In general, I'm very sympathetic to your position here, having worked at other companies that build open source software!

However, if you're not able to build this feature, at a certain point I (or maybe others as well!) will try to build it on my own. I don't know if you see that as a good thing (ecosystem!) or a bad thing (competition?), but it comes from the excitement inherent in the fast-moving machine learning world of today, and especially in the digital art that Stable Diffusion and others have unlocked. I hope that I can work with you all to make it happen collaboratively!

@julien-c
Copy link
Member

Thanks for publishing to brew BTW @singingwolfboy :)

@Wauplin
Copy link
Contributor

Wauplin commented Oct 31, 2022

@singingwolfboy Thanks for your answer and the enthusiasm around HF and huggingface_hub ! :) (like publishing it to brew).

Regarding the initial topic of the issue, we are not planning to add too much features to huggingface-cli on the short-term. huggingface_hub is meant to be a central package for python libraries and for now that's what we are focusing on. On the mid-term, we would not be against having a separate package specifically dedicated to having a more exhaustive CLI tool (and thus making hfh available to scripting tools/CIs/.... If you already have a strong interest in that, feel free to build it :)

To give more context, on the Python side we already have quite some different methods to perform a single action (example: upload_file, upload_folder, Repository, create_commit, push_to_hub to push files to the Hub). Different methods are meant to be optimized for different use cases. Having a CLI would not be as exhaustive as what's existing in Python. Besides, it is always possible to have a small python script and execute it from the terminal (python -c ...) in a user needs to integrate hfh in a workflow.

This doesn't mean we will never do it or that we don't want contributors (on the contrary !!) but for now that is out of scope of hfh. Hope you understand our position on this matter :)

@0xdevalias
Copy link

0xdevalias commented Nov 6, 2022

Just my /2c, one of the first things I expected the CLI to be able to do after login was to be able to download a model; I was pretty surprised to find out it couldn't do it, and it felt like a weird oversight not to be included. This lead to me having to go down a google rabbithole trying to find alternative ways to get to my end solution, seemingly ending up with:

  • write code to interface with the library to download it
  • setup git lfs, clone the entire huggingface model repo (~15gb), copy the model files out of it (~4gb), then remove the rest of the repo

Both of these felt pretty overkill.

I assume there may also be a way that I could just curl the model file directly, but I got a 401 and I wasn't sure how to pass the auth to it in a way it would accept.

I haven't looked too deeply into the code here, but it was said that adding more code to the CLI would make it more complex/harder to maintain/etc, which I understand being a concern. But given that there is already the hf_hub_download function within the lib, it doesn't seem like it would take very much/very complex code to add a lightweight wrapper around it to the existing CLI; and in doing so would provide a huge amount of beneficial support to end users.

@0xdevalias
Copy link

0xdevalias commented Nov 6, 2022

Ok, just played around with this a bit more, and looks like we can add the auth to wget files as follows:

# Get YOURTOKEN from https://huggingface.co/settings/tokens
wget 'https://Bearer:[email protected]/runwayml/stable-diffusion-inpainting/resolve/main/sd-v1-5-inpainting.ckpt'

The code equivalent of this would be:

from huggingface_hub import hf_hub_url, hf_hub_download

# Generate/show the URL
hf_hub_url(
   repo_id="runwayml/stable-diffusion-inpainting",
   filename="sd-v1-5-inpainting.ckpt",
)

# Download the file
hf_hub_download(
   repo_id="runwayml/stable-diffusion-inpainting",
   filename="sd-v1-5-inpainting.ckpt",
)

To add code to support this use case to the CLI wouldn't end up being much longer than that surely.. just needs to take in the param for repo_id and filename and pass it through to the download function.

@Wauplin
Copy link
Contributor

Wauplin commented Nov 10, 2022

Hi @0xdevalias thanks for jumping in the conversation. We might reconsider it in the future but for now the CLI is not so much our priority. Could you please let me know what would be your use case for downloading a file but not using Python code in your environment ?

In the meantime, here is a 1-liner that does what you expect from a huggingface-cli download ... command. I get it that it's less convenient than an integrated command but it does the job :)

➜ python -c "import huggingface_hub; print(huggingface_hub.hf_hub_download('gpt2', filename='config.json'))"
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/75e09b43581151bd1d9ef6700faa605df408979f/config.json

For the record (and future ourselves), one could expect to also be able to download a snapshot of a repo when using huggingface-cli download .... So it would be a wrapper around both hf_hub_download and snapshot_download.
One could also expect to select a destination (e.g. huggingface-cli download --to="..." ...) which is currently not very defined in hfh as for now files are downloaded in a cache folder.

@julien-c
Copy link
Member

if anyone wants to second this feature request please chime in here!

@Wauplin
Copy link
Contributor

Wauplin commented Jul 6, 2023

Hi everyone, I think it's time to re-open this issue. 8 months ago was not the good time but since then the library gained more maturity and we should now be able to offer and maintain a huggingface-cli download CLI interface. I like the solution already described in #1105 (comment) but I'll try to complete it with more examples.

1. Definition

>>> huggingface-cli download --help
usage: huggingface-cli download REPO_ID [PATH] [--help] [--repo-type REPO_TYPE] [--revision REVISION] [--token TOKEN] [--allow-patterns ALLEW_PATTERNS] [--ignore-patterns IGNORE_PATTERNS] [--to-local-dir] [--local-dir-use-symlinks] 

2. Download file from the Hub

# from a model
>>> huggingface-cli download gpt2 config.json
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json

# from a dataset
>>> huggingface-cli download Open-Orca/OpenOrca 1M-GPT4-Augmented.parquet --repo-type=dataset

equivalent to

>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download("gpt2", "config.json")
'/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json'

3. Download entire repo

>>> huggingface-cli download Open-Orca/OpenOrca --repo-type=dataset
/home/wauplin/.cache/huggingface/hub/datasets--OpenOrca--OpenOrce/snapshots/984517afe11f50298f278a704980280950aedb10

equivalent to

>>> from huggingface_hub import snapshot_download
>>> snapshot_download("Open-Orca/OpenOrca", repo_type="dataset")
'/home/wauplin/.cache/huggingface/hub/datasets--OpenOrca--OpenOrce/snapshots/984517afe11f50298f278a704980280950aedb10'

4. Download repo with filters

e.g. download only safetensors weights

>>> huggingface-cli download bigscience/bloom --allow-patterns=*.safetensors
/home/wauplin/.cache/huggingface/hub/models--bigscience--bloom/snapshots/984517afe11f50298f278a704980280950aedb10

equivalent to

>>> from huggingface_hub import snapshot_download
>>> snapshot_download("bigscience/bloom", allow_patterns="*.safetensors")

Could also use --ignore-patterns. Since patterns args accept both str or List[str], let's consider that value can be a comma-separated list of patterns and parse it in CLI wrapper.

5. Download from revision

# Entire Space from PR
>>> huggingface-cli download fffiloni/zeroscope --repo-type=space --revision=refs/pr/78

# Single file from specific commit oid
>>> huggingface-cli download Salesforce/xgen-7b-8k-base generation_config.json --revision=3987e094377fae577bba039af1b300ee8086f9e1

6. Download to local folder

It is now possible to download files to a local folder instead of the cache folder (see explanations and limitations). IMO this should also be integrated in the CLI:

>>> pwd
/home/wauplin/projects/bloom_weights

>>> huggingface-cli download bigscience/bloom --allow-patterns=*.safetensors --to-local-dir
/home/wauplin/projects/bloom_weights

>>> huggingface-cli download bigscience/bloom config.json --to-local-dir
/home/wauplin/projects/bloom_weights/config.json

EDIT: The plan is to use HTTP-only methods. Downloading via this CLI will never be meant to create a local git clone.

7. Download private model

>>> huggingface-cli download Wauplin/private-model --token=hf_***

Token can be passed in CLI command. Let's encourage huggingface-cli login or at least to use environment variable in the terminal instead of hardcoding the token in the command.

Return value

I like the idea from @singingwolfboy (#1105 (comment)) to catch main errors like private repos, gated repos, missing repos, missing files and print meaningful messages to the user. This can be done directly or in a future PR. When the download succeed, I'd be in favor of returning the raw path instead of a message. This will make it easier to integrate in a shell script.


This is a first comment to start over the discussion on this feature request. @singingwolfboy @0xdevalias Feel free to comment and suggest modifications. IMO such a CLI will mainly be a wrapper on top of hf_hub_download and snapshot_download so implementation and testing should be done on top of those without re-implementing anything.

@Wauplin
Copy link
Contributor

Wauplin commented Jul 6, 2023

I also created an issue to add a huggingface-cli upload command: #1543

@Wauplin
Copy link
Contributor

Wauplin commented Sep 6, 2023

And... huggingface-cli download is out !! 🔥 🔥 🔥

Thanks @martinbrose for working on it (#1617) 🤗

Some examples:

>>> huggingface-cli download gpt2 config.json
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
>>> huggingface-cli download gpt2 config.json --local-dir=./models/gpt2
./models/gpt2/config.json
>>> huggingface-cli download bigcode/the-stack --repo-type=dataset --revision=v1.2 --include="data/python/*" --exclu
de="*.json" --exclude="*.zip"
Fetching 206 files:   100%|████████████████████████████████████████████| 206/206 [02:31<2:31, ?it/s]
/home/wauplin/.cache/huggingface/hub/datasets--bigcode--the-stack/snapshots/9ca8fa6acdbc8ce920a0cb58adcdafc495818ae7

For more details, check out the guide. A new release (v0.17) will soon be out including it.

@Wauplin Wauplin closed this as completed Sep 6, 2023
@martinbrose
Copy link
Contributor

Thanks @Wauplin, that's great news!
And special thanks for the detailed initial task description and the guidance along the way.
Was especially interesting to see how it changed from my first proposal to the finished product.

@absalan
Copy link

absalan commented Oct 12, 2023

Getting this error today for downloading a repo. It was working last time that I used it:
Anyone having this issue?

huggingface-cli download gpt2 config.json --local-dir=./models/gpt2
usage: huggingface-cli <command> [<args>]
huggingface-cli: error: invalid choice: 'download' (choose from 'env', 'login', 'whoami', 
'logout', 'repo', 'lfs-enable-largefiles', 'lfs-multipart-upload', 'scan-cache', 'delete-cache')

@Wauplin
Copy link
Contributor

Wauplin commented Oct 12, 2023

@absalan Yes it should work. I believe your huggingface_hub version has been downgraded. Try

pip install -U huggingface_hub

and retry.

If it still happens, can you run huggingface-cli env and copy-paste the output so that I can investigate further? Thanks in advance :)

@absalan
Copy link

absalan commented Oct 12, 2023

Thanks @Wauplin, that solved the issue. Appreciate your quick response.

@mhdpr
Copy link

mhdpr commented Jan 11, 2024

@absalan Yes it should work. I believe your huggingface_hub version has been downgraded. Try

pip install -U huggingface_hub

and retry.

If it still happens, can you run huggingface-cli env and copy-paste the output so that I can investigate further? Thanks in advance :)

- huggingface_hub version: 0.16.4
- Platform: Linux-5.15.0-58-generic-x86_64-with-debian-bookworm-sid
- Python version: 3.7.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/jiangziwen/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: 1.14.0
- Torch: 1.11.0+cu113
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: 0.1.4
- gradio: N/A
- tensorboard: N/A
- numpy: 1.21.5
- pydantic: 1.8.2
- aiohttp: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/jiangziwen/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/jiangziwen/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/jiangziwen/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

Hi @Wauplin I tried pip install -U huggingface_hub, but it doesn't work. Here is the output after running huggingface-cli env

@Wauplin
Copy link
Contributor

Wauplin commented Jan 11, 2024

@mhdpr it looks like you have a mismatch in your install. The output of huggingface-cli env tells you that the installed version is 0.16.4 but the latest version is currently 0.20.2. This means that the library you are updating when doing pip install -U huggingface_hub is not the same one (it's installed somewhere else). What I would suggest is to run which huggingface-cli that will tell you in which environment the CLI is installed. Depending on the result, you should try updating huggingface_hub in the correct environment. If you want to start with a fresh install, you can follow this guide: https://huggingface.co/docs/huggingface_hub/installation#install-with-pip.

(in your case, python -m pip install -U huggingface_hub might be the solution but not sure).

@absalan
Copy link

absalan commented Jan 15, 2024

Thanks @mhdpr @Wauplin for the response. I have re-installed the package then the issue got solved.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 15, 2024

Glad to know your problem's solved @absalan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants