Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118761: Speedup pathlib import by deferring shutil #123520

Merged
merged 6 commits into from
Sep 1, 2024

Conversation

danielhollas
Copy link
Contributor

@danielhollas danielhollas commented Aug 30, 2024

~15% of pathlib import time (1.6ms on my machine) is spent on importing shutil. Since this module is only used in one place it makes sense to defer its import. Incidentally, this also makes the code more explicit imho, it took me a while to understand how the current code works.

image

@hugovk hugovk changed the title gh-118761: Speedup pathlib import by deffering shutil gh-118761: Speedup pathlib import by deferring shutil Aug 31, 2024
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main in a non-debug build (but not a PGO one), I get no improvements at all.

The command to run is ./python -m pyperf timeit 'import pathlib' -l 1 -w 0 -n 1 -p 250. This will ensure that you don't have warmups and don't have cached modules.

  • main: Mean +- std dev: 1.37 ms +- 0.05 ms
  • PR: Mean +- std dev: 1.36 ms +- 0.04 ms

I'm not sure this is worth the change but maybe I'm incorrectly benchmarking the time.

Some idea: shutil imports some additional libs such as fnmatch that are then needed by glob. So when you do from glob import _StringGlobber after import shutil, you'll end up importing fnmatch. So you save the fnmatch import in the glob module by importing shutil. Now I don't know if this specific import is the reason why I don't see an improvement.

Lib/pathlib/_local.py Outdated Show resolved Hide resolved
Lib/pathlib/_local.py Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor

I think this probably shouldn't have a NEWS entry, because the import shutil was introduced only a couple of weeks ago, and there have been no releases from the main branch in that time. A user reading about a 15% performance improvement could be misled.

@danielhollas
Copy link
Contributor Author

The command to run is ./python -m pyperf timeit 'import pathlib' -l 1 -w 0 -n 1 -p 250. This will ensure that you don't have warmups and don't have cached modules.

1.6ms seems suspiciously fast, are you sure the module is not being cached?

I don't think fnmatch is the issue here, I think in the icicle graph I posted above it is already counted as part of glob. Most of the import time of shutil comes from importing compression modules, lzma etc. Measured with python -Ximporttime -c 'import pathlib'. I think I used the PGO build, not sure what is the recommended build for benchmarking.

@hugovk
Copy link
Member

hugovk commented Aug 31, 2024

I think I used the PGO build, not sure what is the recommended build for benchmarking.

Ideally use PGO+LTO.

On macOS with PGO+LTO, running ./python.exe -X importtime -c "import pathlib", the pathlib import takes 7 ms with main and 6 ms with this PR.

You can see the big shutil block is missing underneath pathlib._local afterwards in the tuna importtime visualisations (I ran ./python.exe -c "import pathlib" && ./python.exe -X importtime -c "import pathlib" 2> import.log && tuna import.log):

Before:

image

After:

image

And with hyperfine, this branch is about 1.5 ms faster than main:

hyperfine --warmup 32 \
--prepare "git checkout speedup-pathlib"  './python.exe -c "import pathlib"' \
--prepare "git checkout main"             './python.exe -c "import    pathlib"'
Benchmark 1: ./python.exe -c "import pathlib"
  Time (mean ± σ):      14.8 ms ±   0.6 ms    [User: 11.9 ms, System: 2.4 ms]
  Range (min … max):    14.1 ms …  18.2 ms    96 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: ./python.exe -c "import    pathlib"
  Time (mean ± σ):      16.4 ms ±   0.8 ms    [User: 13.1 ms, System: 2.7 ms]
  Range (min … max):    15.6 ms …  23.2 ms    91 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./python.exe -c "import pathlib" ran
    1.11 ± 0.07 times faster than ./python.exe -c "import    pathlib"

Lib/pathlib/_local.py Outdated Show resolved Hide resolved
@danielhollas
Copy link
Contributor Author

Thank you all for the reviews, I believe I addressed all the comments.

@picnixz
Copy link
Member

picnixz commented Sep 1, 2024

1.6ms seems suspiciously fast, are you sure the module is not being cached?

With a caching it's in the realm of nanoseconds on my laptop. But my specs are a bit... biaised sometimes:

OS: openSUSE Leap 15.5 x86_64
Host: ROG Strix G814JZ_G814JZ 1.0
Kernel: 5.14.21-150500.55.73-default
CPU: 13th Gen Intel i9-13980HX (32) @ 5.400GHz
GPU: NVIDIA GeForce RTX 4080 Max-Q / Mobile
GPU: Intel Raptor Lake-S UHD Graphics
Memory: 4124MiB / 31698MiB

EDIT: It's a bit weird that pyperf reports timings in the realm of 1.6 ms but hyperfine does not:

Benchmark 1: ./python -c "import pathlib"
  Time (mean ± σ):      11.5 ms ±   2.1 ms    [User: 9.7 ms, System: 1.9 ms]
  Range (min … max):    10.2 ms …  20.8 ms    157 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: ./python -c "import    pathlib"
  Time (mean ± σ):      11.9 ms ±   1.3 ms    [User: 10.5 ms, System: 1.4 ms]
  Range (min … max):    11.0 ms …  21.7 ms    149 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  './python -c "import pathlib"' ran
    1.03 ± 0.22 times faster than './python -c "import    pathlib"'

I'm approving the PR though it does not really change on my system much. But since it may affect other users, I think it's worth the change.

@barneygale
Copy link
Contributor

Thanks for this!

@barneygale barneygale merged commit 2304774 into python:main Sep 1, 2024
34 checks passed
@danielhollas danielhollas deleted the speedup-pathlib branch September 1, 2024 14:46
@danielhollas
Copy link
Contributor Author

EDIT: It's a bit weird that pyperf reports timings in the realm of 1.6 ms but hyperfine does not:

hyperfine also measures the overhead of python startup.

Another possible explanation why you don't see a difference, do you have the compression modules available? You can try python -c "import lzma; import zlib; import bz2

@picnixz
Copy link
Member

picnixz commented Sep 1, 2024

I do have them available :( it's an interesting observation though but it's my problem I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants