Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocations and GC being ignored? #26

Open
andyferris opened this issue Nov 2, 2016 · 8 comments
Open

Allocations and GC being ignored? #26

andyferris opened this issue Nov 2, 2016 · 8 comments
Labels

Comments

@andyferris
Copy link

I'm finding different results using BenchmarkTools as opposed to my own loops and @time, particularly regarding allocating operations.

I'm benchmarking Base.Array vs the types in StaticArrays.jl. Some of these arrays are immutable, and don't perform GC allocations, and others are mutable. If you are doing lots of small, e.g., matrix multiplications, then the allocations and GC can dominate the cost, and an immutable array is much faster. I have a non-allocating SArray and a (mutable) allocating MArray.

The typical results I were getting using loops and @time showed that when new copies of Array or MArray were created, a lot of time was spent on allocation and GC (but not for SArray):

=====================================
   Benchmarks for 4×4 matrices
=====================================

Matrix multiplication
---------------------
Array               ->  6.526125 seconds (31.25 M allocations: 3.492 GB, 6.61% gc time)
SArray              ->  0.369290 seconds (5 allocations: 304 bytes)
MArray              ->  1.964021 seconds (15.63 M allocations: 2.095 GB, 12.05% gc time)

Matrix multiplication (mutating)
--------------------------------
Array               ->  4.540372 seconds (6 allocations: 576 bytes)
MArray              ->  0.748238 seconds (6 allocations: 448 bytes)

However, I switched my tests to BenchmarkTools.jl and now the difference between SArray and MArray has disappeared. It appears almost like the allocating costs have been ameliorated somehow. Perhaps I'm using the package wrong, but I get:

=====================================================================
 Matrices of size 4×4 and eltype Float64
=====================================================================
SArray:    m3 = m1 * m2 takes 52.00 ns, 144.00 bytes (GC 0.00 ns)
Array:     m3 = m1 * m2 takes 196.00 ns, 240.00 bytes (GC 0.00 ns)
MArray:    m3 = m1 * m2 takes 55.00 ns, 144.00 bytes (GC 0.00 ns)

Array:     A_mul_B!(m3, m1, m2) takes 150.00 ns, 0.00 bytes (GC 0.00 ns)
MArray:    A_mul_B!(m3, m1, m2) takes 20.00 ns, 0.00 bytes (GC 0.00 ns)

The two calls I make are @benchmark *($(copy(m)), $(copy(m))) and @benchmark A_mul_B!($(copy(m)), $(copy(m)), $(copy(m))), where m is some random matrix 4x4 I made out of the above types. Is that the right way to use @benchmark?

@andyferris
Copy link
Author

Here I use mean instead of the (default) median:

=====================================================================
 Matrices of size 4×4 and eltype Float64
=====================================================================
SMatrix:    m3 = m1 * m2 takes 87.07 ns, 144.00 bytes (GC 19.19 ns)
Array:      m3 = m1 * m2 takes 258.94 ns, 240.00 bytes (GC 32.41 ns)
MMatrix:    m3 = m1 * m2 takes 87.09 ns, 144.00 bytes (GC 19.11 ns)

Array:      A_mul_B!(m3, m1, m2) takes 153.44 ns, 0.00 bytes (GC 0.00 ns)
MMatrix:    A_mul_B!(m3, m1, m2) takes 20.16 ns, 0.00 bytes (GC 0.00 ns)

Now we see GC (it seems in many case it doesn't run, so the median doesn't detect it) but it really doesn't agree with my benchmark (more GC for MArray and none for SArray).

@jrevels
Copy link
Member

jrevels commented Nov 2, 2016

The BenchmarkTools results seem correct to me; it makes sense that the in-place matrix multiply isn't allocating, and that memory usage is small for small arrays.

Also, it seems that you're directly comparing aggregate measurements vs. BenchmarkTool's individual execution estimates, which are totally different measurements (e.g. these numbers seem to be comparing estimates of a single matmul with direct measurements of a large number of matmuls).

I might be missing something though, since I don't have the actual code. If you believe this is unexpected behavior, can you share the actual code used to generate these numbers (preferably self-contained + whatever packages are required)?

@jrevels
Copy link
Member

jrevels commented Nov 8, 2016

I still don't know about the OP here, but found a case where this issue is real (JuliaLang/julia#19257).

It comes from the nothing return block added in #4. If I remove that, it resolves the issue. I'll open a PR shortly.

@andyferris
Copy link
Author

andyferris commented Nov 8, 2016

By resolves the issue, do you mean reports more allocations and longer run-times?

EDIT: Yes, I see that in the PR. Looks great.

@jrevels
Copy link
Member

jrevels commented Nov 8, 2016

I don't know how #28 affects the issue in the OP here because I can't verify the issue in the OP (I don't have the code).

#28 fixes an issue where the function BenchmarkTools was constructing was "overly" amenable to code ellision (either from LLVM or Julia). Even after that PR, however, it's still simple to construct cases where the ellision will occur (see that PR for an example). I think in such cases, BenchmarkTools is correct to report no allocation, since running such a function "in the wild" should have the same effect given a similar calling environment.

@andyferris
Copy link
Author

So #28 doesn't seem to affect my results in any meaningful way.

I'm going to have a deeper look sometime but I can't right now. The code is at https://github.com/JuliaArrays/StaticArrays.jl/blob/use-size/perf/benchmark3.jl but that might not be too useful to you (I'll try make a minimal example when I have more time).

@KristofferC
Copy link
Contributor

KristofferC commented Nov 8, 2016

Just a note that #28 fixes a problem we had where some functions were being completely elided. Was just going to file an issue and then it is already fixed. What a luxury.

@gdalle gdalle added the bug label Jun 13, 2023
@gdalle
Copy link
Collaborator

gdalle commented Sep 18, 2023

Is this bug still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants