-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do exact array resize!() in more cases #22038
Conversation
Any thoughts about |
IMO the same changes should be done to |
This makes sense to me. For example, when you create a sparse matrix it is common that you know exactly how many stored elements you will have so you might do a Edit: On the other hand you usually start with empty vectors then, which is already covered by the current code. |
Since the cost of the size not being exact is ~one extra resize versus up to 2x memory waste, taking size hints and resizes literally seems like a good idea. |
Also, in the motivating use case, #21938, we know that the arrays will never grow again. |
Since Line 925 in 298fffb
|
First take on making |
@nanosoldier |
Looks like @nanosoldier doesn't listen to me. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
On the positive side, |
The necessity last commit basically means that user code that reserves multiple elements with |
Definitely there should be a way for the user to grow an array the old way. |
|
I'm not sure |
Could somebody ask nanosoldier? Thanks! |
@nanosoldier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure resize!() is used that much in the packages in the append context.
The point is that this change gives a O(1)
memory saving and a O(n)
performance regression. It's basically always wrong when a small increment is used.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Now the benchmarks looks ok, let's try to decide on the API changes :) I don't like |
No, it's not documented in the same way as |
What is "the right thing" for Well, I'm out of anti-preemptive arguments for the moment :) With the updated |
Personally I would expect "the right thing" for |
Short of any other function to do it currently I believe I'll still repeat that exact resize will only save |
Isn't the thing that |
My reasoning for 25% is that there are papers which report that 50% growth actually gives optimal performance for exponential growth, and 0% is O(n^2) case that's problematic, so split the difference and make the cutoff halfway between 0% and 50%. The precise cutoff probably doesn't matter. |
Should there be a shrinkwrap exception? I.e. when the requested size is ≤ the current size, we make it exactly that size? I guess that could lead to bad behavior if a vector is being used as a stack or a dequeue. |
The growing factor is #8269 and #16305 and I would rather not tie it up with this one. What I was suggesting as a default heuristic is basically
And we can change the grow factor if there's clear evident that it's better. |
It's an expected pattern to call |
Also #16305 (comment) may be relevant? |
I've reverted the API changes and implemented 125%/200% growth thresholds heuristic discussed above. I think it should be sensitive enough to do exact resize in most of relevant cases. Let's see what nanosoldier thinks of that. Shrinkwrap might be useful for some applications, although it's much more easy to hit memory limits with imprecise growth than with imprecise shrinkage. And definitely it would break much more code patterns than exact growth. |
CI tests look ok, could somebody please ping nanosoldier as it doesn't listen to me? |
@nanosoldier |
it would make the most sense to me if the cutoff here was such that this PR didn't change the minimum amount of growth, but only the amount of space used when that minimum is exceeded |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Why? Why does that make more sense? The point of my proposal was that if you're in the bad O(n^2) then each growth is small – of fixed size, typically. Having any 1+ε multiplicative threshold should suffice to avoid that. It doesn't make sense to me why 2x growth makes sense as the cutoff and no one has made a compelling argument for it. Feeble as my "25% is halfway between 0% and 50%" argument is, at least it is an argument. |
2x is our array growth cutoff elsewhere, and it doesn't make sense for resize to arbitrarily use a different growth ratio. This also resizes to a discontinuous size - ask for 24% larger and the space doubles, but ask for 25% larger and it grows by less? That doesn't seem like a useful or predictable behavior. |
So it's in between a "optimum" and the worst? That actually feels kind of random, mostly the 0% part. And
is actually also why I think the cutoff should not be lower than the grow factor. |
Ok. The continuity argument is a good one – I buy that. Thank you for actually making a case. Let's go with that then, and the appropriate growth factor can be tweaked later (or not at all). |
Now it should be "continuous". I wonder if it helps sparse matrix tests. Nanosoldier, please? |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
I see no relevant regressions in the last nanosoldier run (no real improvements either). |
A NEWS entry would be great. Then I think we're good to go here! Nice work :) |
If the requested size of array buffer is >=200% of its current size, grow exactly to the requested size, otherwise double the current size. This should make resize!() & sizehint!() allocate the exactly requested size in more situations (previously the exact resize was guaranteed for empty arrays only). This heuristic should reduce memory usage when arrays are growing non-incrementally (e.g. temporary buffers allocated by LAPACK wrappers).
Squashed and added NEWS entry. |
Anything else to do before merging? |
Sometimes it's useful to have precise control over memory size allocated for a given array (see e.g. #21938).
One would assume it could be done by
resize!()
, but under the hood it callsjl_array_grow_at_end()
.The latter method is actually tuned for the gradually growing arrays, it would allocate the exactly requested size only if the array is empty,
otherwise it would allocate the next power of 2 that is greater or equal to the requested size.
There are several options:
exact
argument toresize!()
/sizehint!()
resize!()
/sizehint!()
to always grow to the exactly specified sizeresize_exact!()
method (either exported or not)