Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show mean time in @btime #258

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

mcabbott
Copy link

@mcabbott mcabbott commented Sep 18, 2021

There is room to show a bit more information with @btime, and I think the next number to show after the minimum is the mean. So this PR does this:

julia> @btime sum(log, $(rand(1000)));
  min 3.693 μs, mean 3.721 μs (0 allocations)

julia> @btime sum(log, rand(1000));
  min 4.429 μs, mean 5.242 μs (1 allocation, 7.94 KiB. GC mean 282 ns, 5.38%)  # initial version
  min 4.506 μs, mean 5.478 μs (1 allocation, 7.94 KiB)                         # minimal version

Before:

julia> @btime sum(log, $(rand(1000)));
  5.285 μs (0 allocations: 0 bytes)

Notice that it also doesn't bother writing "0 bytes" if there are no allocations, in the name of reducing clutter. I think that's safe but if anyone has a counter-example speak up.

I'm less sure this is a good idea, but it also prints a the mean GC time, which I hope is likely to be the most useful one. (Often the fastest run will not trigger this.) I think it's less confusing to show the time than just the percentage -- I hope it is clear that this is the mean of the GC times, as a percentage of the mean time shown. Not the mean GC percentage.

Xref #217 for ongoing discussion about @benchmark's histograms.

Edit: for comparison, here's Base.@time's minimal and maximal display:

julia> let x = rand(1000)
         @time sum(log, x)
       end;
  0.000012 seconds
 
julia> f(n) = prod(Any[sum(randn(1000,1000)) for _ in 1:n]);

julia> @time f(10)
  0.089296 seconds (990 allocations: 76.352 MiB, 70.85% gc time, 1.80% compilation time)
1.357876781727917e26

This also skips 0 allocations when zero, and prints GC time + compilation time when these are nonzero. It's longer than the longest @btime above. I suppose this is also an argument for showing GC time as a percentage, which could also save some space.

Edit': perhaps this wants a screenshot. A variant with bold times & only GC percentage:

Screenshot 2021-10-02 at 19 38 22

Notice that the minimum times are in the wrong order, compared to the mean. And also that the GC percentages greatly understate the average cost of allocations.

@MasonProtter
Copy link
Contributor

MasonProtter commented Sep 18, 2021

One thing I really like about this format is that most users don’t realize when they use @btime that it’s reporting a minimum. Most assume it’s an average or median time.

@mcabbott
Copy link
Author

mcabbott commented Sep 18, 2021

It could also say min 1 allocation, 7.94 KiB since that's what it is, but maybe it's usually stable enough for this not to matter.

With everything now, the line is usually about 82 characters. Do we care about getting it under 80? It's not super-confusing if line wrapped. (The zero-allocation case is more like 45 characters, perfect for slack...)

src/execution.jl Outdated
Comment on lines 574 to 587
$BenchmarkTools.prettymemory($BenchmarkTools.memory($trialmin)), ")")
$print(" min ", $BenchmarkTools.prettytime($BenchmarkTools.time($trialmin)),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to interpolate $BenchmarkTools.prettymemory not just $prettymemory? I see I left some and later dropped them.

Comment on lines -561 to +581
trialmin, trialallocs = gensym(), gensym()
trialmin, trialmean, trialallocs = gensym(), gensym(), gensym()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to write @gensym trialmin trialallocs etc. here?

@KristofferC
Copy link
Contributor

First they came for @benchmark...

I think there is great value in having @btime only show the number which is in most cases useful which is the minimum. For more data, there is benchmark.

@mcabbott
Copy link
Author

This is a fair point. And there's something to be said for matching @time (although that now adds compilation time, too). I'd also like to cut down @benchmark a bit, to make the step up smaller than it is now.

Can I persuade anyone to try out this branch in real life for a bit, to see if they like or hate it in actual use?

@KristofferC
Copy link
Contributor

The issue is that for cases where you would use @btime over @benchmark the mean is a worse statistic than the mean. If you want to show extra stuff, perhaps the mode could be shown, but the more you measure the more the mode will go towards the minimum anyway.

@mcabbott
Copy link
Author

mcabbott commented Oct 3, 2021

Isn't the mean what's relevant for throughput? I guess this PR takes the position that it's the most useful second number. For example, with the allocating sum(x .* y) above, the GC time seems to be an under-estimate of the penalty: 16%, while almost 60% of the mean time is absent from the minimum. This seems relevant for micro-benchmarking a function you hope to use in a loop.

The second number @time prints is GC percentage, from just one run of course. I would argue that showing the percentage of the mean without showing the mean is weird (and usually GC does not run at all on the min run). And that the mean time is more useful anyway -- it's easier to have all numbers referring to the total time. If we want to keep the length shorter, deleting this would help.

The mode seems tricky, on what are essentially real numbers. Should they be binned first, or can we trust the clock to do that well enough?

julia> b2 = @benchmark sum(log.(rand(10^4)));
    
julia> length(b2.times)
10000

julia> diff(sort(unique(b2.times)))'
1×766 adjoint(::Vector{Float64}) with eltype Float64:
 41.0  84.0  41.0  1.0  41.0  42.0  41.0  1.0  41.0  1.0  41.0  41.0  1.0  …  174917.0  63750.0  17416.0  3417.0  709.0  499.0  42834.0  300417.0

vs another machine

julia> diff(sort(unique(b2.times)))'
1×4452 adjoint(::Vector{Float64}) with eltype Float64:
 268.0  108.0  15.0  14.0  4.0  59.0  17.0  7.0  4.0  3.0  28.0  17.0  …  3656.0  68996.0  33322.0  4102.0  6553.0  18376.0  2.22808e6  562661.0

This and the median might be contenders for the best one number. But, as you say, these will often approach the minimum... which @btime already prints, and is the easiest to explain, so this PR doesn't propose to change.

@stevengj
Copy link
Contributor

stevengj commented Oct 20, 2021

Isn't the mean what's relevant for throughput?

The problem is that it's noisy and unreliable; if something else happens to run on your system that temporarily interrupts the julia process, you can get a huge spike in time that will throw off the mean completely.

Basically, what source of timing variance are you interested in averaging? In many cases the variance is due to things outside of Julia, which makes it of questionable utility in benchmarking the performance of Julia code per se. I agree that there are some cases where timing variance arises from Julia itself— for example, due to garbage collection, or perhaps your algorithm itself is stochastic. In the latter cases, however, you need to pay much closer attention to statistics and probably @benchmark is more useful.

@mcabbott
Copy link
Author

Nice, somehow I never looked at the paper.

I guess garbage collection is the major thing I had in mind, since it's very common to have different implementations with different allocations. The reported GC percentage is quite a large underestimate of how much time this is going to cost. This is a rare event, so something like mean(5-to-95th) will miss it.

Agree this measure will be noisier than min, but the claim here is only that it's a sufficiently useful second number to be worth printing. If it's huge, that's a warning to investigate more closely.

Is there another number which might better serve this purpose?

@KristofferC
Copy link
Contributor

Ref #37 (comment). @btime is perfect for most use cases where you would want to use BenchmarkTools and I am kindly asking not to change it. Feel free to add a @bmean or modify @benchmark or whatever, just leave @btime alone, please.

@stevengj
Copy link
Contributor

stevengj commented Oct 20, 2021

I guess garbage collection is the major thing I had in mind

A problem with benchmarking this is that it is strongly context dependent — how often gc occurs depends not just on the function you are benchmarking, but whether it is called in a context that also does allocation. (gc times aren't additive!) This is especially problematic here because @btime and @benchmark are usually employed for micro-benchmarking small pieces of a larger computation.

That's not to say that benchmarking gc is unimportant! But, in combination with the difficulty of eliminating external noise, my sense is that the mean will mislead more that it informs in the @btime output. If you need more detailed statistics, @benchmark is what you want.

@mcabbott
Copy link
Author

mcabbott commented Oct 20, 2021

more detailed statistics, @benchmark is what you want

But which bit -- what would you take from that as a better (if still imperfect) measure of the cost of allocations? It shows min/median/mean/std/max. And the histogram stops at the 99th percentile, which IIRC typically omits GC.

Expanding the above example:
Screenshot 2021-10-20 at 13 20 36
And if I include these functions as part of some larger long-running loop, then it seems the mean doesn't do badly as a prediction of how long the overall calculation will take, unlike the minimum:

julia> for N in [10,100,10^4,10^5]
         @show N
         x=randn(N); y=randn(N);
         @btime dot($x, $y)
         @btime sum($x .* $y)
       end;  # shown above

julia> pairs1(A,B) = [dot(x,y) for x in eachcol(A), y in eachcol(B)];

julia> pairs2(A,B) = [sum(x .* y) for x in eachcol(A), y in eachcol(B)];

julia> @btime pairs1(A,B) setup=(N=10^4; A=rand(N,1000); B=rand(N,1000));
  min 4.842 s, mean 4.842 s (2 allocations, 7.63 MiB)

julia> @btime pairs2(A,B) setup=(N=10^4; A=rand(N,1000); B=rand(N,1000));
  min 11.087 s, mean 11.087 s (2000002 allocations, 74.56 GiB. GC mean 23.56%)

@KristofferC
Copy link
Contributor

And if I include these functions as part of some larger long-running loop, then it seems the mean doesn't do badly as a prediction of how long the overall calculation will take, unlike the minimum:

It is because you are not running a representative benchmark. You are doing the equivalent of

 @btime sin.(rand(10^4:10^5))

and then saying that the minimum is a bad statistic for giving the mean value when running that code multiple times.

Mean is bad. Every single noise of your computer, every latency based on the OS scheduler, etc contributes to the mean. Don't take it from me, take it from this dude https://youtu.be/vrfYLlR8X8k?t=1487 (the whole talk is worth looking at).

Again, make a @bmean or something, leave @btime alone.

@chriselrod
Copy link

I want this.

I'm perfectly fine with naming it @bmean or something else to preserve @btime.

@KristofferC
Copy link
Contributor

Yes, call it something else.

@chriselrod
Copy link

Yes, call it something else.

Are you happy to rename it? Seems easy enough to make everyone happy this way.

@gdalle
Copy link
Collaborator

gdalle commented Sep 18, 2023

@mcabbott doing some cleanup before a 2.0 release, would you wanna rename this @bmean and try to get it merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants