refactor: pypi mapping into cached client#3318
Conversation
|
#3322 This should fix the |
|
I asked @mariusvniekerk on Zulip to try this. Maybe @tobiasfischer or @matthewfeickert could also give it a go to see if that helps with slow PyPI mapping issues that you observed. |
@wolfv I was too slow in testing the original PR @wolfv linked me to on Discord, and the artifacts had expired. You'd like testing of the Updating lock file takes forever Discord questions with the build artifacts from this PR though? |
|
yes, @matthewfeickert - this one is the better implementation. If you could test again with artifacts from this PR here that would be great. |
|
I've tested it on a slow mapping example, as reported by @mariusvniekerk . after profiling with cargo flamegraph I can't find any samples for |
|
@baszalmstra can we close this PR then? #3079 |
|
Yep! |
nichmor
left a comment
There was a problem hiding this comment.
awesome work!
Works as expected.
Ah okay great. I'll test this now. (Sorry at CERN this week and very behind everything.) |
|
Hm. I may be doing something dumb in how I'm testing, but this build is slower than the current pixi manifestUsing the following manifest from Updating lock file takes forever Discord question on a Linux x86 machine with NVIDIA GPU. pixi.toml:[workspace]
authors = ["Matthew Feickert <matthew.feickert@cern.ch>"]
channels = ["conda-forge"]
name = "example"
platforms = ["linux-64"]
version = "0.1.0"
[tasks]
[system-requirements]
cuda = "12"
[dependencies]
python = "3.11.*"
pip = "<25.1"
numpy = "<2"
pytorch = ">=2.4"
torchvision = "*"
tqdm = "*"
matplotlib = "*"
scipy = "*"
Pillow = ">=7.1"
git = "*"
natsort = "*"
compilers = "*"
cmake = "*"
xformers = "*"
pytorch-lightning = "*"
pytorch-metric-learning = "*"
timm = "*"
h5py = "*"
scikit-learn = "*"
ftfy = "*"
cython = "*"
protobuf = "*"
termcolor = ">=1.1"
werkzeug = "*"
yacs = ">=0.1.6"
pycocotools = ">=2.0.2"
hydra-core = ">=1.1.0rc1"
grpcio = "*"
fvcore = "*"
cloudpickle = "*"
click = "*"
black = "==21.4b2"
absl-py = "*"
markdown = "*"
omegaconf = ">=2.1.0rc1"
pathspec = "*"
platformdirs = "*"
portalocker = "*"
opencv = "*"
tensorboard-data-server = "*"
tensorboard = "*"
setuptools = "*"
typing_extensions = "4.11.*"
regex = ">=2024.11.6,<2025"
gdown = ">=5.2.0,<6"
pydot = "*"
dataclasses = "*"
future = "*"
tabulate = "*"
pyparsing = "==3.0.9"
lvis = ">=0.5.3,<0.6"
[target.linux-64.dependencies]
cuda-version = "12.*"
pytorch-gpu = "*"
cuda-cudart-dev = "*"
cuda-crt = "*"
cudnn = "*"
libcusparse-dev = "*"
cuda-driver-dev = "*"
cuda-nvcc = "*"
cuda-nvrtc-dev = "*"
cuda-nvtx-dev = "*"
cuda-nvml-dev = "*"
cuda-profiler-api = "*"
cusparselt = "*"
libcublas-dev = "*"
libcudss-dev = "*"
libcufile-dev = "*"
libcufft-dev = "*"
libcurand-dev = "*"
libcusolver-dev = "*"
[pypi-dependencies]
diffdist = ">=0.1,<0.2"
iopath = ">=0.1.7,<0.1.9"This PR build$ uname -sm
Linux x86_64
$ command -v pixi
/tmp/check/pixi-bin/pixi
$ pixi --version
pixi 0.42.1
$ time pixi lock
...
real 0m30.242s
user 0m22.262s
sys 0m9.682s
$ time pixi update
✔ Lock-file was already up-to-date
real 0m24.757s
user 0m14.616s
sys 0m7.653s
$ rm pixi.lock
$ time pixi lock
...
real 0m25.713s
user 0m15.182s
sys 0m7.865scurrent public
|
|
@matthewfeickert just to confirm, you are seeing this on a "normal" machine with normal (speedy) internet? |
I'm on my normal work laptop and I'm currently on CERN wifi which (while patchy sometimes) gives me a speedtest result of about 200 Mbps down/up. I can login to some clusters in the US at the University of Chicago that I have access to and see if I see different results there. |
|
And also, could you run withotu the PyPI dependencies so that we have a baseline? |
|
University of Chicago cluster login nodes (that don't have GPUs attached, as those are on workers) This PR build$ command -v pixi
/home/feickert/workarea/pixi-debug/pixi-bin/pixi
$ time pixi lock
real 1m38.032s
user 1m22.396s
sys 2m17.010s
$ time pixi update
✔ Lock-file was already up-to-date
real 0m18.171s
user 0m10.260s
sys 0m39.578s
$ rm pixi.lock
$ time pixi lock
...
real 0m18.392s
user 0m10.330s
sys 0m40.193scurrent public
|
|
Thanks a lot for the testing @matthewfeickert, the difference is so extreme that it might have to do with the CI build, let me test the difference and otherwise provide you with a build using the release profile. |
Will do, but will be a few hours given meetings. 👍 |
src/lock_file/update.rs
Outdated
|
|
||
| let mapping_client = self.mapping_client.unwrap_or_else(|| { | ||
| MappingClient::builder(client) | ||
| .with_concurrency_limit(Arc::new(Semaphore::new(100))) |
There was a problem hiding this comment.
After some testing I found that this could be really fast, but also extremely slow "sometimes" as in a single hyperfine run of 10 runs, it could differ between 0.7 second and 10 seconds.
Lowering the value to 10 made it much more consistent between 1 and 2 seconds but it wouldn't go below 1 second anymore. Thus making the fastest run less fast.
I think this should reuse the configuration setting --concurrent-downloads or we add --concurrent-io-actions and use that here too.
There was a problem hiding this comment.
I cannot reproduce any of these results on my windows machine. Ill try to reuse the --concurrent-downloads flag though!
|
@matthewfeickert I've spend some more time on testing it, I'll get back to it with @baszalmstra and @wolfv to figure out opportunities to make it consistently faster. |
(Day later, sorry) This PR build$ command -v pixi
/tmp/check/pixi-bin/pixi
$ time pixi lock
...
real 0m10.638s
user 0m12.271s
sys 0m9.605s
$ time pixi update
✔ Lock-file was already up-to-date
real 0m2.933s
user 0m4.596s
sys 0m6.754s
$ rm pixi.lock
$ time pixi lock
...
real 0m2.956s
user 0m4.429s
sys 0m6.786scurrent public
|
|
hello everyone! We have benchmarked a little more this PR! The 4 binaries storyso we took 4 binaries as a baseline to benchmark. All benchmarks were run using the These are the results when run These are the results when running as we can see, The 3 binaries storyNow that we exclude This is with clean cache: This is with warm cache: as we can see, this Other ideassome other ideas left to try:
|
|
I added some logging statements that signify the efficiency of the mapping: Cache misses are also reporting when running with |
|
This one says DEBUG resolve_conda{group=default platform=linux-64}:derive_purl{record="libgettextpo-devel-0.23.1-h5888daf_0.conda"}: pypi_mapping: Cache miss on 'https://conda-mapping.prefix.dev/hash-v0/90f29ec7a7e2d758cb61459e643dcb54933dcf92194be6c29b0a1591fcbb163e' (404 Not Found) |
|
I forgot to mention, the latest code also honors the |
Its a cache miss because the server was contacted. And 404's dont have any cache headers so they will always be cache misses. |
|
This works as I expect so far as I can test it! Thanks guys for the in-depth review of the situation! |
|
Thanks for the detailed work and analysis. It is great to see this level of commitment to excellence in projects like this that touch the entire community. 🙇 |
Refactors the pypi mapping in a more client based approach. The
MappingClienthas a function that can be called to amend purls. It also serves as the entry point for coalescing of requests and in memory caching.Also fixes #2615