Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pip download #3163

Open
idlsoft opened this issue Apr 20, 2024 · 26 comments
Open

Support pip download #3163

idlsoft opened this issue Apr 20, 2024 · 26 comments
Labels
cli Related to the command line interface compatibility Compatibility with a specification or another tool

Comments

@idlsoft
Copy link
Contributor

idlsoft commented Apr 20, 2024

This would be especially useful for buildng docker images.

You could then rely on uv for a quick resolve and use a simple pip install --no-deps --find-links in your Dockerfile.

@samypr100
Copy link
Collaborator

This is along the similar lines of supporting pip wheel discussed in #1681

@zanieb zanieb added compatibility Compatibility with a specification or another tool cli Related to the command line interface labels Apr 23, 2024
@inoa-jboliveira
Copy link

inoa-jboliveira commented May 9, 2024

Hi everyone, is this feature on the roadmap? I am guessing supporting "pip download" would be straightforward since uv already downloads packages. It would just have to not install them.

Any help needed?

@charliermarsh
Copy link
Member

We should be able to support it... Though it's not trivial because we don't store the .whl files at all, we unzip them directly into the cache. So most of the data pipelines are oriented around an API that receives the unzipped wheel, rather than the zipped wheel.

What are the typical use-cases here?

@idlsoft
Copy link
Contributor Author

idlsoft commented May 9, 2024

What are the typical use-cases here?

In my case it's a docker build in a github workflow.

Caching docker layers on github runners is impossible AFAIK. Caching ~/.cache is trivial.
So a build could download whls into the docker working dir, then do

COPY whls whls
RUN pip install --no-deps --find-links whls ....

Which wouldn't hit pypi and wouldn't need any additional caching from docker.

... This can be even better if you can do RUN --mount ...

@charliermarsh
Copy link
Member

I'm mostly wondering if it has to be wheels or if we could just make it easy to pre-populate the uv cache.

@idlsoft
Copy link
Contributor Author

idlsoft commented May 9, 2024

I'm mostly wondering if it has to be wheels or if we could just make it easy to pre-populate the uv cache.

wheels are supported by standard pip install.
Otherwise you need to use uv inside docker build. Not bad necessarily, but a bit less flexible.

@zanieb
Copy link
Member

zanieb commented May 9, 2024

I'm not sure how much we should go out of our way to support using pip to consume an output of uv? It seems weird to use uv in one case and pip in another, right?

@charliermarsh
Copy link
Member

If it were equally easy for us I'd probably prefer to output wheels, it's a nicer intermediary format that's less coupled to our internal cache. I'd need to see how hard it is to support.

@inoa-jboliveira
Copy link

inoa-jboliveira commented May 9, 2024

What are the typical use-cases here?

Hi, my use case is that I have to supply bundles of my application with all dependencies for systems where it is not possible to download them (firewall blocking). Right now I use pip download which results in a bunch of wheel files.

Also we would like to be able to cross platform download them too

@charliermarsh
Copy link
Member

That makes sense, thanks.

@idlsoft
Copy link
Contributor Author

idlsoft commented May 9, 2024

I'm not sure how much we should go out of our way to support using pip to consume an output of uv? It seems weird to use uv in one case and pip in another, right?

That part of the workflow may not be entirely under your control.
@inoa-jboliveira's example is a better one I think, because it's essentialy about packaging your application.
Packaging may need to comply with a specific post-install procedure.

@samypr100
Copy link
Collaborator

samypr100 commented May 10, 2024

Right now I use pip download which results in a bunch of wheel files.

Note, both pip download and pip wheel are similar but there's crucial differences. If you're looking to package up your wheels, pip wheel is often recommended instead since it covers for cases where a download does not have a pre-built wheel, being a more complete solution to pre-packaging wheels for a target system.

As a result, I tend to just always use pip wheel nowadays to make sure I always have wheels rather than potential source distributions that I'll have to build on the target system. From my perspective, pip download is more useful when you want to package up sdists or when you don't care if everything you download is fully pre-built.

Some of these tradeoffs were actually discussed in #1681.

@inoa-jboliveira
Copy link

Hi @samypr100 in this specific case I really just want to download pre built wheels and not build anything. I can pip download --platform foo and I am good to go. That's why I need and still use pip download.

As for pip wheel, I can't cross platform download (nor compile) anything

@pickfire
Copy link

pip download -d vendor/ --index-url internal_pypi internal-sdk can be used with uv pip install -f vendor if we need to vendor anything.

@jbw-vtl
Copy link

jbw-vtl commented Jul 15, 2024

Also quite interested in this as pip download is a slower part in our CI environment.

In our case our security scanning tool requires to run on a folder of wheel / source distributions, we currently use pip download to gather these.

@cbrnr
Copy link

cbrnr commented Jul 26, 2024

Another use case would be downloading build time dependencies (in addition to runtime dependencies). I'm not sure if this is feasible, since it is not supported by pip (pypa/pip#7863). However, this would be extremely useful when building a Flatpak which involves Python packages, which is currently broken because of that (flatpak/flatpak-builder-tools#380).

@zanieb zanieb changed the title Support pip download Support pip download Aug 11, 2024
@ei-grad
Copy link

ei-grad commented Aug 13, 2024

Although pip download is very handy for multi-stage Docker builds to efficiently cache dependencies,

uv doesn’t store the .whl files; instead, it unzips them directly into the cache.

It would be even better if the uv cache could be used to install requirements directly into the target stage instead of installing from the wheels. I'm curious whether there's a straightforward method to populate the uv cache for a list of packages.

@RichardDally
Copy link

We have a business use case to scan dependencies of a Python project, we need to pip download requirements, it's slow without uv 😒

@notatallshaw
Copy link
Contributor

notatallshaw commented Aug 29, 2024

We have a business use case to scan dependencies of a Python project, we need to pip download requirements, it's slow without uv 😒

Not saying you shouldn't use uv, but do you have an example where pip 24.2 is slow at downloading?

Especially if you've already pre-resolved the requirements with uv pip compile, as hopefully the biggest bottleneck is IO. I should be able to profile and see if there's any low hanging fruit in pip that can be fixed.

Also should be a good scenario to see if uv can advertise being faster here or not.

@inoa-jboliveira
Copy link

inoa-jboliveira commented Aug 29, 2024

pip download is pretty ok/fast enough for my needs. I do also open multiple processes and am constrained only by network so UV won't be any faster without cache (I'm not the guy above).

It is necessary for 2 reasons:

  • fully replace pip and not need it as a dependency
  • it is likely 99% done already, just missing the interface or some "do not unzip" flag to the actual data that is cached. Maybe a re-zip cached wheels for not redownloading them.

By using cache it would indeed be faster than pip for same platform downloads. Although, for my use case, I need to download cross platform, so the packages won't be there (e.g. numpy or pandas which are large plataform specific packages).

@lmmx
Copy link

lmmx commented Sep 12, 2024

What are the typical use-cases here?

At the risk of repeating what other people have said, to chime in with my use case (also Docker image building for deployment, reproducibility is a secondary concern for me) and perhaps shed light on why pip download is important enough to support:

  • In a Docker build context, there used to be (i.e. outdated practice) a flag to pip install —global-option=build_ext which would trigger package build from source
  • Now if you want to do that build_ext has been deprecated* for an explicit build step, so to build from source you would first run pip download to download that source** followed by building that and uv build takes these downloaded wheel
    • *because it kind of was considered out of scope for install commands to be also building, and presumably with the intro of the build backend toml format
    • **or get it from a repo’s release files, but via pip was the standard way
  • So the pseudocode workflow used to be “pip install —build-and-install my-package and now it’s pip download my-package, python setup.py build my-package my-wheel, pip install-my-wheel”.
  • With uv we’d perhaps be able to something more integrated (uv download+build+install my-package), maybe with helpful optimisations like caching

Also I’d note that as a user of these toolsets it can be confusing to keep up with the proliferation of ways to do the one thing as the state of the art evolves

I found an issue RE: pip wheel when looking for this thread, and it notes that “pip would prefer to deprecate pip wheel” (#1681)

@daler-rahimov
Copy link

Hello everyone, I wanted to contribute some additional use cases for consideration. While most discussions here focus on the cloud, my perspective comes from the embedded world. Let’s consider the 10 billion devices currently operating on cellular networks, of which 1.8 billion are IoT/M2M devices. Many of these devices do not have access to "unlimited good bandwidth" but still require software updates, CI, etc.

When using Python, one way to accelerate these deployments is by pre-fetch PyPI packages ahead of time. This is not about supporting pip download but rather working with cached downloads. For example, you could use a method like this in a pre-fetch for big packages like TensorFlow (~400MB):

file_url="https://files.pythonhosted.org/packages/5e/31/d49a3dff9c4ca6e6c09c2c5fea95f58cf59cc3cd4f0d557069c7dccd6f57/tensorflow-2.7.4-cp39-cp39-manylinux2010_x86_64.whl"
wget --continue --quiet -P . "$file_url"

And then the actual software deployment could use this pre-fetched file and get the rest of the dependencies from the PyPI directly.

By enabling the UV package manager to use these pre-fetched/cached files, deployments become more efficient. Also, it's intuitive for a user to think about UV operations as downloading, storing, and installing.

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 13, 2024

By enabling the UV package manager to use these pre-fetched/cached files, deployments become more efficient. Also, it's intuitive for a user to think about UV operations as downloading, storing, and installing.

I don't think this is the same issue? And should already be possible.

If you have acquired the wheels you can install directly from them (or use --find-links):

$ pip download requests --no-deps
Collecting requests
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
Saved ./requests-2.32.3-py3-none-any.whl

$ uv pip install ./requests-2.32.3-py3-none-any.whl
Resolved 5 packages in 176ms
Prepared 5 packages in 120ms
Installed 5 packages in 37ms
 + certifi==2024.8.30
 + charset-normalizer==3.3.2
 + idna==3.8
 + requests==2.32.3 (from file:///home/dshaw/uvtest/3163/requests-2.32.3-py3-none-any.whl)
 + urllib3==2.2.3

Also, pretty sure you can install from uv on your base machine, copy the cache to other devices, and then point uv cache to the copied directory, this should use even less resources (CPU and storage) on your IoT devices, as there's no extra step of unzipping the contents and storing it somewhere.

@daler-rahimov
Copy link

I don't think this is the same issue? And should already be possible.

Unfortunately it's not possible. I have better explanation here if you are interested #7296.

$ uv pip install ./requests-2.32.3-py3-none-any.whl

Installing a single package isn't the goal here but managing all the dependencies with uv. Basically doing something like this uv sync --find-links [path-to-some-pre-fetched-wheels]

copy the cache to other devices, and then point uv cache to the copied directory,

In this use case, the biggest problem is data usage (for some devices, you pay per MB of usage), and the UV's cache contains the unzipped versions of wheels. For example, TensorFlow, which is ~400MB, expands into GB of data

@notatallshaw
Copy link
Contributor

notatallshaw commented Sep 13, 2024

In this use case, the biggest problem is data usage (for some devices, you pay per MB of usage), and the UV's cache contains the unzipped versions of wheels. For example, TensorFlow, which is ~400MB, expands into GB of data

To copy onto the device before it does any downloading or the total amount of storage on the device?

Because if it's the total amount of storage on the device then you will use less space by copying the cache, because copying the wheel will take up the wheel + the install, whereas copying the cache will just be the install, and the the site-packages location will just hard link to the cache and use no additional space.

If it's to initially copy onto could zip the uv cache up and then have a small script that unzips it into the actual uv cache folder and then delete the zip.

I'm not saying it wouldn't be helpful for uv to have a download function and what you propose in #7296, just spitballing solutions with existings tools.

@T-256
Copy link
Contributor

T-256 commented Nov 17, 2024

According to prior discussion, instead of having two uv pip download and uv pip wheel, I think we can have one top-level command, let's call it uv collect for now. it then will be able to collect all files of dependencies by using resolved dependency of current lockfile.

$ # auto build sdists.
$ uv collect

$ # prevent include prebuilt wheels.
$ uv collect --requires-build-only

$ # cross-platform lockfile resolving.
$ uv collect --python-platform windows

$ # avoid build sdists, but clone them as they are (for sdist wheels copy `tar.gz` file,
$ # for git dependency clone repo/subdirectory, for editables copy folder).
$ uv collect --clone

# by default, it collects all indexes, we can limit it.
$ uv collect --exclude-index pypi  # collect all others except `pypi`
$ uv collect --index internal_pypi  # only `internal_pypi`

I also think it could be integrated to some part of uv's caching system, e.g. built-wheels caches.

I hope it help to this issue and #1681 to going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Related to the command line interface compatibility Compatibility with a specification or another tool
Projects
None yet
Development

No branches or pull requests