Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address #2220 (slow download perf against PyPi mirror) #2319

Merged
merged 1 commit into from
Mar 10, 2024

Conversation

thundergolfer
Copy link
Contributor

Summary

Addressing the extremely slow performance detailed in #2220. There are two changes to increase download performance:

  1. setting accept-encoding: identity, in the spirit of Accept encoding identity pypa/pip#1688
  2. increasing buffer from 8KiB to 128KiB.

1. accept-encoding: identity

I think this related pip PR has a good explanation of what's going on: pypa/pip#1688

  # We use Accept-Encoding: identity here because requests
  # defaults to accepting compressed responses. This breaks in
  # a variety of ways depending on how the server is configured.
  # - Some servers will notice that the file isn't a compressible
  #   file and will leave the file alone and with an empty
  #   Content-Encoding
  # - Some servers will notice that the file is already
  #   compressed and will leave the file alone and will add a
  #   Content-Encoding: gzip header
  # - Some servers won't notice anything at all and will take
  #   a file that's already been compressed and compress it again
  #   and set the Content-Encoding: gzip header

The files.pythonhosted.org server is the 1st kind. Example debug log I added in uv when installing against PyPI:

image

(there is no content-encoding header in this response, the whl hasn't been compressed, and there is a content-length header)

Our internal mirror is the third case. It does seem sensible that our mirror should be modified to act like the 1st kind. But uv should handle all three cases like pip does.

2. buffer increase

In #2220 I observed that pip's downloading was causing up-to 128KiB flushes in our mirror.

After fix 1, uv was still only causing up-to 8KiB flushes, and was slower to download than pip. Increasing this buffer from the default 8KiB led to a download performance improvement against our mirror and the expected observed 128KiB flushes.

Test Plan

Ran benchmarking as instructed by @charliermarsh

image

No performance improvement or regression.

@charliermarsh charliermarsh added the performance Potential performance improvement label Mar 10, 2024
@charliermarsh charliermarsh merged commit e16140a into astral-sh:main Mar 10, 2024
7 checks passed
@charliermarsh
Copy link
Member

Thanks! I read through the linked pip PR. This makes sense to me.

@zanieb
Copy link
Member

zanieb commented Mar 10, 2024

Thank you!

charliermarsh added a commit that referenced this pull request Mar 10, 2024
## Summary

Like #2319, there are a few other places where we attempt to stream a
file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants