Skip to content

refactor: pypi mapping into cached client#3318

Merged
ruben-arts merged 17 commits intoprefix-dev:mainfrom
baszalmstra:refactor/pypi_mapping
Mar 18, 2025
Merged

refactor: pypi mapping into cached client#3318
ruben-arts merged 17 commits intoprefix-dev:mainfrom
baszalmstra:refactor/pypi_mapping

Conversation

@baszalmstra
Copy link
Copy Markdown
Contributor

@baszalmstra baszalmstra commented Mar 10, 2025

Refactors the pypi mapping in a more client based approach. The MappingClient has a function that can be called to amend purls. It also serves as the entry point for coalescing of requests and in memory caching.

Also fixes #2615

@ruben-arts
Copy link
Copy Markdown
Contributor

#3322 This should fix the check-cli-docs job

@baszalmstra baszalmstra marked this pull request as ready for review March 11, 2025 10:40
@baszalmstra baszalmstra added the test:extra_slow Run the extra slow tests label Mar 11, 2025
@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 11, 2025

I asked @mariusvniekerk on Zulip to try this.

Maybe @tobiasfischer or @matthewfeickert could also give it a go to see if that helps with slow PyPI mapping issues that you observed.

@matthewfeickert
Copy link
Copy Markdown
Contributor

Maybe @tobiasfischer or @matthewfeickert could also give it a go to see if that helps with slow PyPI mapping issues that you observed.

@wolfv I was too slow in testing the original PR @wolfv linked me to on Discord, and the artifacts had expired. You'd like testing of the Updating lock file takes forever Discord questions with the build artifacts from this PR though?

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 11, 2025

yes, @matthewfeickert - this one is the better implementation. If you could test again with artifacts from this PR here that would be great.

@nichmor
Copy link
Copy Markdown
Contributor

nichmor commented Mar 12, 2025

I've tested it on a slow mapping example, as reported by @mariusvniekerk .

after profiling with cargo flamegraph

I can't find any samples for ament_purls - this means that we have a huge improvement :) comparing to previous 464 samples for it.

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 12, 2025

@baszalmstra can we close this PR then? #3079

@baszalmstra
Copy link
Copy Markdown
Contributor Author

Yep!

Copy link
Copy Markdown
Contributor

@nichmor nichmor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work!

Works as expected.

@matthewfeickert
Copy link
Copy Markdown
Contributor

this one is the better implementation. If you could test again with artifacts from this PR here that would be great.

Ah okay great. I'll test this now. (Sorry at CERN this week and very behind everything.)

@matthewfeickert
Copy link
Copy Markdown
Contributor

matthewfeickert commented Mar 13, 2025

Hm. I may be doing something dumb in how I'm testing, but this build is slower than the current v0.42.1 release for me.

pixi manifest

Using the following manifest from Updating lock file takes forever Discord question on a Linux x86 machine with NVIDIA GPU.

pixi.toml:
[workspace]
authors = ["Matthew Feickert <matthew.feickert@cern.ch>"]
channels = ["conda-forge"]
name = "example"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[system-requirements]
cuda = "12"

[dependencies]
python = "3.11.*"
pip = "<25.1"
numpy = "<2"
pytorch = ">=2.4"
torchvision = "*"
tqdm = "*"
matplotlib = "*"
scipy = "*"
Pillow = ">=7.1"
git = "*"
natsort = "*"
compilers = "*"
cmake = "*"
xformers = "*"
pytorch-lightning = "*"
pytorch-metric-learning = "*"
timm = "*"
h5py = "*"
scikit-learn = "*"
ftfy = "*"
cython = "*"
protobuf = "*"
termcolor = ">=1.1"
werkzeug = "*"
yacs = ">=0.1.6"
pycocotools = ">=2.0.2"
hydra-core = ">=1.1.0rc1"
grpcio = "*"
fvcore = "*"
cloudpickle = "*"
click = "*"
black = "==21.4b2"
absl-py = "*"
markdown = "*"
omegaconf = ">=2.1.0rc1"
pathspec = "*"
platformdirs = "*"
portalocker = "*"
opencv = "*"
tensorboard-data-server = "*"
tensorboard = "*"
setuptools = "*"
typing_extensions = "4.11.*"
regex = ">=2024.11.6,<2025"
gdown = ">=5.2.0,<6"
pydot = "*"
dataclasses = "*"
future = "*"
tabulate = "*"
pyparsing = "==3.0.9"
lvis = ">=0.5.3,<0.6"

[target.linux-64.dependencies]
cuda-version = "12.*"
pytorch-gpu = "*"
cuda-cudart-dev = "*"
cuda-crt = "*"
cudnn = "*"
libcusparse-dev = "*"
cuda-driver-dev = "*"
cuda-nvcc = "*"
cuda-nvrtc-dev = "*"
cuda-nvtx-dev = "*"
cuda-nvml-dev = "*"
cuda-profiler-api = "*"
cusparselt = "*"
libcublas-dev = "*"
libcudss-dev = "*"
libcufile-dev = "*"
libcufft-dev = "*"
libcurand-dev = "*"
libcusolver-dev = "*"

[pypi-dependencies]
diffdist = ">=0.1,<0.2"
iopath = ">=0.1.7,<0.1.9"

This PR build

$ uname -sm
Linux x86_64
$ command -v pixi
/tmp/check/pixi-bin/pixi
$ pixi --version
pixi 0.42.1
$ time pixi lock
...
real	0m30.242s
user	0m22.262s
sys	0m9.682s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m24.757s
user	0m14.616s
sys	0m7.653s
$ rm pixi.lock
$ time pixi lock
...
real	0m25.713s
user	0m15.182s
sys	0m7.865s

current public v0.42.1 release build

$ command -v pixi
/home/feickert/.pixi/bin/pixi
$ pixi self-update 
✔ pixi is already up-to-date (version 0.42.1)
$ rm -rf .pixi pixi.lock
$ time pixi lock  # Some of these are already cached
...
real	0m19.997s
user	0m10.373s
sys	0m1.343s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m22.995s
user	0m11.546s
sys	0m1.554s
$ rm pixi.lock
$ time pixi lock
...
real	0m21.862s
user	0m10.973s
sys	0m1.498s

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 13, 2025

@matthewfeickert just to confirm, you are seeing this on a "normal" machine with normal (speedy) internet?

@matthewfeickert
Copy link
Copy Markdown
Contributor

matthewfeickert commented Mar 13, 2025

just to confirm, you are seeing this on a "normal" machine with normal (speedy) internet?

I'm on my normal work laptop and I'm currently on CERN wifi which (while patchy sometimes) gives me a speedtest result of about 200 Mbps down/up.

I can login to some clusters in the US at the University of Chicago that I have access to and see if I see different results there.

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 13, 2025

And also, could you run withotu the PyPI dependencies so that we have a baseline?

@matthewfeickert
Copy link
Copy Markdown
Contributor

University of Chicago cluster login nodes (that don't have GPUs attached, as those are on workers)

This PR build

$ command -v pixi
/home/feickert/workarea/pixi-debug/pixi-bin/pixi
$ time pixi lock

real	1m38.032s
user	1m22.396s
sys	2m17.010s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m18.171s
user	0m10.260s
sys	0m39.578s
$ rm pixi.lock 
$ time pixi lock
...
real	0m18.392s
user	0m10.330s
sys	0m40.193s

current public v0.42.1 release build

[05:33] login02.af.uchicago.edu:~/workarea/pixi-debug/example $
$ command -v pixi
/home/feickert/.pixi/bin/pixi
$ pixi self-update 
✔ pixi is already up-to-date (version 0.42.1)
$ rm -rf .pixi pixi.lock
$ time pixi lock  # Some of these are already cached
...
real	0m6.682s
user	0m6.278s
sys	0m1.650s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m6.618s
user	0m6.238s
sys	0m1.632s
$ rm pixi.lock
$ time pixi lock
...
real	0m6.680s
user	0m6.201s
sys	0m1.741s

@ruben-arts
Copy link
Copy Markdown
Contributor

ruben-arts commented Mar 13, 2025

Thanks a lot for the testing @matthewfeickert, the difference is so extreme that it might have to do with the CI build, let me test the difference and otherwise provide you with a build using the release profile.

@matthewfeickert
Copy link
Copy Markdown
Contributor

And also, could you run withotu the PyPI dependencies so that we have a baseline?

Will do, but will be a few hours given meetings. 👍


let mapping_client = self.mapping_client.unwrap_or_else(|| {
MappingClient::builder(client)
.with_concurrency_limit(Arc::new(Semaphore::new(100)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing I found that this could be really fast, but also extremely slow "sometimes" as in a single hyperfine run of 10 runs, it could differ between 0.7 second and 10 seconds.

Lowering the value to 10 made it much more consistent between 1 and 2 seconds but it wouldn't go below 1 second anymore. Thus making the fastest run less fast.

I think this should reuse the configuration setting --concurrent-downloads or we add --concurrent-io-actions and use that here too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce any of these results on my windows machine. Ill try to reuse the --concurrent-downloads flag though!

@ruben-arts
Copy link
Copy Markdown
Contributor

@matthewfeickert I've spend some more time on testing it, I'll get back to it with @baszalmstra and @wolfv to figure out opportunities to make it consistently faster.

@matthewfeickert
Copy link
Copy Markdown
Contributor

And also, could you run withotu the PyPI dependencies so that we have a baseline?

(Day later, sorry)
This is on my work laptop again on CERN wifi

This PR build

$ command -v pixi
/tmp/check/pixi-bin/pixi
$ time pixi lock
...
real	0m10.638s
user	0m12.271s
sys	0m9.605s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m2.933s
user	0m4.596s
sys	0m6.754s
$ rm pixi.lock
$ time pixi lock
...
real	0m2.956s
user	0m4.429s
sys	0m6.786s

current public v0.42.1 release build

$ command -v pixi
/home/feickert/.pixi/bin/pixi
$ time pixi lock
...
real	0m1.787s
user	0m2.224s
sys	0m0.370s
$ time pixi update
✔ Lock-file was already up-to-date

real	0m1.797s
user	0m2.222s
sys	0m0.388s
$ rm pixi.lock
$ time pixi lock
...
real	0m1.765s
user	0m2.188s
sys	0m0.374s

This seems to be more similar to the timing that @ruben-arts had noted earlier on Discourse.

@nichmor
Copy link
Copy Markdown
Contributor

nichmor commented Mar 18, 2025

hello everyone! We have benchmarked a little more this PR!

The 4 binaries story

so we took 4 binaries as a baseline to benchmark.
pixi - released version with all performance optimizations enabled.
pixi-pypi-mapping - this PR version
pixi-0.42.1-main - latest pixi main branch
pixi-0.42.1 - pixi from the 0.42.1 tag.

All benchmarks were run using the pixi.toml from this comment: #3318 (comment).

These are the results when run with cache:

pixi update ran
    1.15 ± 0.03 times faster than pixi-0.42.1 update
    1.17 ± 0.02 times faster than pixi-pypi-mapping update
    1.21 ± 0.05 times faster than pixi-0.42.1-main update

These are the results when running without cache:

pixi update ran
    1.20 ± 0.04 times faster than pixi-0.42.1-main update
    1.21 ± 0.07 times faster than pixi-pypi-mapping update
    1.36 ± 0.68 times faster than pixi-0.42.1 update

as we can see, released pixi is always faster as it have all performance optimizations enabled ( like another allocator, some jemalloc settings ). We will exclude it from the benchmarks as it will be always faster.

The 3 binaries story

Now that we exclude released pixi, let's benchmark pixi main and pixi 0.42.1 against this PR.

This is with clean cache:

Benchmark 1: pixi-v0.42.1 update
  Time (mean ± σ):      4.701 s ±  0.319 s    [User: 3.879 s, System: 2.220 s]
  Range (min … max):    4.175 s …  5.096 s    10 runs
 
Benchmark 2: pixi-main update
  Time (mean ± σ):      4.638 s ±  0.236 s    [User: 3.801 s, System: 2.073 s]
  Range (min … max):    4.182 s …  5.006 s    10 runs
 
Benchmark 3: pixi-pypi-mapping update
  Time (mean ± σ):      4.543 s ±  0.330 s    [User: 3.765 s, System: 2.012 s]
  Range (min … max):    4.044 s …  5.010 s    10 runs
 
Summary
  pixi-pypi-mapping update ran
    1.02 ± 0.09 times faster than pixi-main update
    1.03 ± 0.10 times faster than pixi-v0.42.1 update

This is with warm cache:

Benchmark 1: pixi-v0.42.1 update
  Time (mean ± σ):      2.396 s ±  0.051 s    [User: 3.046 s, System: 0.842 s]
  Range (min … max):    2.330 s …  2.458 s    10 runs
 
Benchmark 2: pixi-main update
  Time (mean ± σ):      2.477 s ±  0.066 s    [User: 3.038 s, System: 0.776 s]
  Range (min … max):    2.398 s …  2.606 s    10 runs
 
Benchmark 3: pixi-pypi-mapping update
  Time (mean ± σ):      2.392 s ±  0.088 s    [User: 2.993 s, System: 0.703 s]
  Range (min … max):    2.315 s …  2.590 s    10 runs
 
Summary
  pixi-pypi-mapping update ran
    1.00 ± 0.04 times faster than pixi-v0.42.1 update
    1.04 ± 0.05 times faster than pixi-main update

as we can see, this PR is faster than pixi-main or pixi-v0.42.1 so we can assert! that the new implementation actually makes things faster

Other ideas

some other ideas left to try:

  • lower concurrency number of requests
  • try to spawn tasks instead of collecting futures so other workers can steal them.

cc @baszalmstra @ruben-arts

@baszalmstra
Copy link
Copy Markdown
Contributor Author

I added some logging statements that signify the efficiency of the mapping:

INFO resolve_conda{group=default platform=linux-64}: pypi_mapping: Amended 409 out of 409 records with purls in 300ms. 371 cache hits and 37 cache misses (90.93%).

Cache misses are also reporting when running with -vvv:

DEBUG resolve_conda{group=default platform=linux-64}:derive_purl{record="libgettextpo-devel-0.23.1-h5888daf_0.conda"}: pypi_mapping: Cache miss on 'https://conda-mapping.prefix.dev/hash-v0/90f29ec7a7e2d758cb61459e643dcb54933dcf92194be6c29b0a1591fcbb163e' (404 Not Found)

@wolfv
Copy link
Copy Markdown
Member

wolfv commented Mar 18, 2025

This one says cache miss but it also says 404 - is that the same?

DEBUG resolve_conda{group=default platform=linux-64}:derive_purl{record="libgettextpo-devel-0.23.1-h5888daf_0.conda"}: pypi_mapping: Cache miss on 'https://conda-mapping.prefix.dev/hash-v0/90f29ec7a7e2d758cb61459e643dcb54933dcf92194be6c29b0a1591fcbb163e' (404 Not Found)

@baszalmstra
Copy link
Copy Markdown
Contributor Author

I forgot to mention, the latest code also honors the --concurrent-downloads flag.

@baszalmstra
Copy link
Copy Markdown
Contributor Author

This one says cache miss but it also says 404 - is that the same?

Its a cache miss because the server was contacted. And 404's dont have any cache headers so they will always be cache misses.

@ruben-arts
Copy link
Copy Markdown
Contributor

This works as I expect so far as I can test it! Thanks guys for the in-depth review of the situation!

@ruben-arts ruben-arts enabled auto-merge (squash) March 18, 2025 15:01
@ruben-arts ruben-arts disabled auto-merge March 18, 2025 15:31
@ruben-arts ruben-arts merged commit 85dfc40 into prefix-dev:main Mar 18, 2025
31 checks passed
@matthewfeickert
Copy link
Copy Markdown
Contributor

Thanks for the detailed work and analysis. It is great to see this level of commitment to excellence in projects like this that touch the entire community. 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:extra_slow Run the extra slow tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

conda-pypi-map should be relative to pixi.toml?

5 participants