Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mutating stat! function for non-allocating filesystem stat #46410

Merged
merged 3 commits into from
Jun 19, 2023

Conversation

dalum
Copy link
Contributor

@dalum dalum commented Aug 19, 2022

The current implementation of Base.Filesystem.stat always allocates a temporary buffer of 224 bytes to hold the output of jl_stat for each call. When calling stat on a large number of files, this has a small performance overhead and may cause GC to trigger. This PR adds a mutating Base.Filesystem.stat! function that allows you to pass in a pre-allocated buffer to be re-used in consecutive calls to stat!, as well as a function, Base.Filesystem.get_stat_buf for getting a re-usable buffer.

On my machine, this results in the following improvements:

With stat:

julia> @benchmark Base.stat($f)
BenchmarkTools.Trial: 10000 samples with 90 evaluations.
 Range (min  max):  712.311 ns    8.879 μs  ┊ GC (min  max): 0.00%  88.35%
 Time  (median):     814.472 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   840.062 ns ± 163.038 ns  ┊ GC (mean ± σ):  0.35% ±  1.78%

  ▂       ▂▃▃▅█▆▅▅▄▂▆▄▃▃▅▅▄▃▂▃▂▁         ▁▂▁   ▁                ▂
  █▇▇▆▅▇▇██████████████████████████▇▇██▇▇███▇▆▇██▇▇▅▇▅▅▄▅▄▅▄▄▁▄ █
  712 ns        Histogram: log(frequency) by time       1.13 μs <

 Memory estimate: 224 bytes, allocs estimate: 1.

With Base.Filesystem.stat!:

julia> @benchmark Base.Filesystem.stat!($(Base.Filesystem.get_stat_buf()), $f)
BenchmarkTools.Trial: 10000 samples with 126 evaluations.
 Range (min  max):  653.833 ns   1.725 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     664.071 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   689.712 ns ± 65.549 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▃▁        ▂▃  ▃   ▃   ▃   ▁▁              ▂               ▂
  ██████▇▇▇▆██████▅██▆▆█▆▆▆█▇▅▆██▆▆▄▅▇▅▅▅▄█▅▄▄▆█▃▁▃▄██▅▃▁▃▃▅▄▄ █
  654 ns        Histogram: log(frequency) by time       968 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@KristofferC
Copy link
Member

KristofferC commented Aug 22, 2022

Out of curiosity, what is the performance difference if you run it on something more realistic? Calling stat on the same thing over and over is a bit artificial; calling it on a large set of files (which likely is the typical use case) might stress the filesystem to the point where the allocation cost is negligible.

@dalum
Copy link
Contributor Author

dalum commented Aug 23, 2022

That's actually the use-case I was noticing the allocations for. Here's a real-world scenario where I'm looping over and checking 46 files.

With stat!:

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  32.362 μs … 98.715 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     32.676 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   34.489 μs ±  3.823 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▁    ▁  ▂▃      ▂▁  ▂▁  ▂▄   ▃▂              ▃    ▁▂      ▂
  ███▆▇▄▅██▅██▅▁██▁▄██▃▅██▃▃██▇▄▄██▄▅▅█▇▄▁▄▇█▄▁▁███▁▃▁██▇▅▃▃▅ █
  32.4 μs      Histogram: log(frequency) by time      46.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

With stat:

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  34.414 μs … 983.463 μs  ┊ GC (min … max): 0.00% … 95.44%
 Time  (median):     36.779 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   39.736 μs ±  13.714 μs  ┊ GC (mean ± σ):  0.44% ±  1.35%

  ▇█▄▂        ▄▄ ▁▂  ▂▁ ▃▃▁▂▃▂▁▃▄▃▃▄▄▂▂▃▂▁ ▁    ▁   ▁▁   ▁     ▂
  ████▇▅██▅▇▆▅██▇██▄▇████████████████████████▇▇███▇████▇██▇▆▆▆ █
  34.4 μs       Histogram: log(frequency) by time      53.3 μs <

 Memory estimate: 10.06 KiB, allocs estimate: 46.

Pretty much in the same ballpark in terms of benefits. I think it's worth noting also that for real-time applications (which isn't a huge concern for my present problem), the worst-case scenario is improved by an order of magnitude (since it avoids the GC trigger)

@vtjnash
Copy link
Member

vtjnash commented Sep 2, 2022

real-time applications

They aren't going to be calling stat

@dalum
Copy link
Contributor Author

dalum commented Sep 2, 2022

real-time applications

They aren't going to be calling stat

In that case, disregard that remark 🤷‍♀️

FWIW, I was tracking down performance issues with some in-house code, and calls to mtime and isfile (which use stat under the hood) were by far the biggest culprits and the only part of the code that caused allocations. These small performance gains were obviously not enough, so I have compromised and found other ways to avoid calling stat! that often. But it felt non-Julian to only have the allocating version of a function that can be made trivially non-allocating, which is the reason for this PR

@brenhinkeller brenhinkeller added the performance Must go faster label Nov 17, 2022
@dalum
Copy link
Contributor Author

dalum commented Jun 15, 2023

Bumpity-bump 😊


Return a buffer of bytes of the right size for [`stat!`](@ref).
"""
get_stat_buf() = zeros(UInt8, Int(ccall(:jl_sizeof_stat, Int32, ())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this size not consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was adopted from the allocating implementation. I don't know if it can be safely replaced by a hard-coded constant, if that is what you mean?

@oscardssmith oscardssmith added the filesystem Underlying file system and functions that use it label Jun 15, 2023
@oscardssmith
Copy link
Member

Looks good to me

@oscardssmith oscardssmith added the forget me not PRs that one wants to make sure aren't forgotten label Jun 15, 2023
@oscardssmith oscardssmith merged commit 427b123 into JuliaLang:master Jun 19, 2023
@oscardssmith oscardssmith removed the forget me not PRs that one wants to make sure aren't forgotten label Jun 19, 2023
@giordano
Copy link
Contributor

This had no tests at all. Also, not added to the docs, if it was meant to be public API.

@oscardssmith
Copy link
Member

well it's not exported and has no docs, so it's not public API? but yeah, I was probably hasty on merge.

@giordano
Copy link
Contributor

well it's not exported and has no docs, so it's not public API?

That's why I said "if it was meant to be public API".

@rfourquet
Copy link
Member

rfourquet commented Jun 21, 2023

I believe Base usually avoids defining non-public stuff which is not used by Base itself, because then it's not super useful. It looks to me that stat! should be either added to the docs (and possibly marked as experimental if there is some doubts), or deleted.

@dalum
Copy link
Contributor Author

dalum commented Jun 26, 2023

The point of the PR was to add Base.stat! as a non-allocating public API alternative to stat. I just forgot to add it to the docs, but I can open another PR for that 😊 I'm unsure if it should be exported, since stat! is a rather short identifier? I'm personally not a fan of populating the default namespace with few-letter identifiers, so would probably lean towards a "no" myself.

@oscardssmith
Copy link
Member

I think this is worth exporting. stat! is pretty short, but we already export stat so stat! seems pretty unambiguous.

@vtjnash
Copy link
Member

vtjnash commented Jun 28, 2023

I feel like we should revert that. I don't really see the point in having a non-allocating stat!. I would actually probably rather have expected stat! to call chmod / chgrp / utime / etc. (which would be rather odd though)

@dalum
Copy link
Contributor Author

dalum commented Jun 28, 2023

I don't really see the point in having a non-allocating stat!.

I'm not going to die on a hill defending this, but I would like to understand your argument. The point is to be able to avoid allocations in a function that can easily be made to re-use a block of memory, instead of forcing the user to allocate a new one at every invocation, at no extra maintenance cost. It provides a way for the user to opt in to a marginal performance improvement, if they so wish.

@StefanKarpinski
Copy link
Member

@vtjnash, can you elaborate on why you don't feel like this is worth having? The allocation is too small to matter? The compiler is likely to be able to eliminate it? What's the thinking?

@vtjnash
Copy link
Member

vtjnash commented Jul 20, 2023

It is too insignificant, and easy for the compiler to eliminate if we cared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filesystem Underlying file system and functions that use it performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants