-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-place iteration results in too big allocations #120091
Comments
The pointer not being the same does not necessarily mean the allocation hasn't been reused. The allocator can call
Well, this is a non-guaranteed optimization so we can't document the exact behavior. But it probably makes sense to mention that something like this may happen.
That's intended to allow vec-allocation recycling even when the type changes, which can mean the excess factor is infinite. |
On my machine this does result in a memmove. This is the strace output of an execution:
|
This extra capacity also has negative performance impacts: fn test_func() {
let v1 = (0u16..0x4000).map(|i| [i; 32]).collect::<Vec<_>>();
let v2 = v1.into_iter().map(|x| x[0] as u8).collect::<Vec<_>>();
// Prevent the optimizer from removing the allocation
assert_eq!(v2[0], 0);
}
fn main() {
let mut now = std::time::Instant::now();
for _ in 0..10_000 {
test_func();
}
println!("{}", now.elapsed().as_millis());
} This code takes more than 10 times as long on my machine if it was compiled with the beta compiler rather than the stable one. |
It seems that much like capacity growth has a scaling factor there should be some sort of limit on the unused capacity that will be used for optimizations like this. Maybe there should be a general rule for So the manual capacity operations still work, but automatic capacity growth and allocation will not use more than some constant factor of the in-use memory. It seems obvious to me that some rule is required here. If the next version of rust made If we have that rule spelt out then we can decide if this operation should follow the rule, or have some sort of exception. But I think it should probably follow it. (Or maybe have a slightly relaxed threshold.) |
Not necessarily. The capacity has already been allocated, so the cost has been paid. And vecs never shrink by themselves, even if you |
I agree that this is a bit of a weird case, but I would argue that from the user's perspective this is a "new Vec". So it is somewhat surprising that it is holding on to arbitrary amounts of memory from another vector. It would be nice to provide some guarantees.
Please not that in my rule I said "maximum elements since ...". This would allow the current behaviour of |
The cost of an allocation is over it's entire lifetime: It's memory cannot be used by something else. I pretty much agree with @kevincox and was about to post something similar: There should be a note in the docs for .collect<Vec<_>() that states that the resulting capacity can never be more than twice the length. (Or wherever in the docs that would cover any other similar cases) |
"surprising" is a matter of expectations. Different people have different expectations. E.g. some people have expressed surprise when collect doesn't reuse allocations. And as I have mentioned in a previous comment using a factor-based heuristic doesn't help if you're intentionally doing something like
Vec does not guarantee that anywhere for any operation. Quite the opposite:
|
Small note: I think that |
If your size hint is incorrect we could just panic. You're breaking an API contract. It's called "hint" because it's not required to be tight, not because you're allowed to return garbage. |
I don't think anyone is claiming that there is currently a guarantee. Personally I am claiming that there should be a guarantee because users are relying on some sort of reasonable memory usage, and it would be nice to actually promise that. |
Maybe this would best be provided by a dedicated API rather than this weird looking trick? This would make it explicit and obvious to readers. |
Well, there's comments here and there telling people to use just that if they want the recycling behavior... so you could say people also rely on that behavior and have different expectations. |
Looking at the assembly for the following two functions
It seems to me that on beta, the first one (where the element type matches the result type), works completely in place without allocations. So here the resulting capacity being bigger would be expected. The second function also first calculates the result in place, but then seems to do an allocation and copying of the data. So there might be a bug or missed optimization in the size calculation for this new allocation. With rust 1.75, both look like they first allocate a new location for the result. |
That's unexpected. It should be calling into realloc, not memcpy |
This is not really relevant, since it's impossible to reuse an allocation with a different alignment in stable Rust. |
Yeah I think I was testing with jemalloc. |
I agree with others in this thread. The behavior is quite surprising to me, which is generally a bad thing for a language that advertises itself as reliable. Even more so if you plan on building system abstractions with it. I think at the very least there has to be an upper limit to the reuse. Anything above 2x seems like a footgun to me. As the original post author encountered already. Also the property that it will re-use for different alignments also seems quite surprising to me. |
Might be the default implementation of the global allocator for |
Which seems unnecessary if the target type is overaligned? I think this should only be necessary when the target type is under/mis-aligned. Which is never true if the target type is smaller than the source type. |
libc allocators generally don't provide an |
First of all, I'd like to say that the core idea of this optimization is a good one. In general, mapping in place is much less costly than allocating, mapping across, then deallocating the old allocation. Avoiding a new allocation and avoiding the resulting cache traffic is good. With that being said, just because the idea is good in general does not mean that it may never be unhelpful in some cases, and the case demonstrated in the blog post, where over 100x the necessary memory is retained, is definitely unhelpful. The documentation could be improved, but:
So, let's focus on expectations:
You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls I'd suggest that in case of too much extreme capacity -- when the new capacity is, say, 4x+ the new length -- the implementation should call
From a number of users affected point of view, I'd expect that saving memory for the majority of users who'd get bitten by excess capacity is worth it. The tiny fraction of users for whom preserving the entire allocation is absolutely necessary can always ask for a dedicated explicit function, or create a crate in the meantime. |
I think this is reasonable. I might even go as far as "more than 2x", since that means we have more capacity than the unspecialized doubling-growth would have produced. |
This is very poor argument imo. It may be a fact, but it's not something we should bend over backwards to accomodate. If everyone programs based on vibes then we can just delete the docs. API contracts exist for a reason because people's expectations are not universal and thus won't lead to consistent outcomes. If you code to assumptions you build on a foundation of sand. Sometimes people just assume very silly things. Sometimes they assume reasonable things that happen to be false because they conflict with other reasonable things.
The optimization has existed since 2020. It was recently only extended to a few more cases. I agree that there are some issues with the generalization but the fundamental approach has already worked nicely for a while. That the system allocator doesn't support alignment-changing realloc is an issue worth considering though.
That depends on what is excess capacity. If you keep reusing a vec then there is no excess. The capacity will be filled up in the next round. So this is use-dependent.
As I have said several times that basing this on a factor (such as 4x) means that clear-and-recycle-allocation uses become impossible with this approach even though we have told people to do it this way. We can change that of course, but then we're just breaking a different set of users which currently aren't making noises. Just because one set is visible doesn't mean the other set doesn't exist. |
Personally my expectations of For example it is very surprising that calling this function with different types can use hugely different amounts of memory for the return value. Note that the function that creates the vector itself has no actual idea if it will return something with lots of extra capacity. fn test(i: impl IntoIterator<Item=u32>) -> Vec<u32> {
i.into_iter().filter(|_| false).collect()
}
fn main() {
eprintln!("vec: {}", test(vec![1; 1000]).capacity()); // 1000
eprintln!("vec: {}", test([1; 1000]).capacity()); // 0
} Personally my mental model of fn my_collect<T>(i: impl Iterator<Item=T>) -> Vec<T> {
let mut out = Vec::with_capacity(i.size_hint().0);
for e in i {
out.push(e);
}
out
} And I appreciate that it is nice to add allocation reuse here, even if it is user-perceptible. But as others have agreed with holding around "arbitrary" amounts of extra memory is very surprising. My preferred outcome would be:
I think all of these are quite important changes:
I think one potentially missing thing would be how to handle reliable allocation reuse while changing the type. IIUC this can really only be done if the new type is |
Right, we can't know -- but that's why I think this optimization should tread closer to as-if behavior. Limiting to |
I think the "as-if" comparison is a great one! I believe that we should keep the behaviour similar to what is "expected" by the simple model of these combinators. Managing to perform clever optimizations is great when they are all upsides. But when clever optimizations accidentally bite you it really sucks. So I'd rather make sure that the optimizations are improvements in the vast majority of use cases. If we have optimizations that may be helpful or may be hurtful it is probably better to allowing the user to opt-in explicitly, even if generally the benefit is positive. Otherwise seemingly benign changes to the code like replacing a I get that this is definitely a grey area because it is impossible to define what is expected by users and because |
On the topic of explicit APIs, I was thinking that a Many collections today allow hinting at the capacity, and reserving at least that much. That is, if I know I will be further using a Today, this is possible in two steps to ensure the target capacity: let mut v = Vec::with_capacity(x);
v.extend(some_iterator); A minor issue is that it is two steps, which leads to the major issue that allocation reuse is not possible. A
Further, because the target capacity is known in advance, inlining It does still suffer from one disadvantage: there is no guarantee it will reuse the underlying allocation. This is annoying when reuse is necessary for performance reasons. |
…viper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
What would that do if the target type doesn't have a concept of capacity? Would this be based on a new trait like Edit: I guess I'm still trying to work out if the problem is a Vec problem, or is it more general. Can we allow downstream types to make use of the memory, or is that just too far out of scope? |
…viper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
Rollup merge of rust-lang#120116 - the8472:only-same-alignments, r=cuviper Remove alignment-changing in-place collect This removes the alignment-changing in-place collect optimization introduced in rust-lang#110353 Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance. Explanation from rust-lang#120091 (comment): > > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`. > > I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/vec/in_place_collect.rs#L299-L303), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](https://github.com/rust-lang/rust/blob/db7125f008cfd72e8951c9a863178956e2cbacc3/library/alloc/src/alloc.rs#L280-L321) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same. > > Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
I am not sure if I'm following the whole To me there are two different variants:
=> Two me it seems there are two API missing: One where I can define the (minimum) capacity of the target collection and one where the original collection is kept without any shrinking attempt. |
I think the point is that |
This only works if the if the target collection is actually smaller than capacity. With something like
except the current "issue" is not a universal issue: I was proposing
So this would be yet another use case, potentially requiring yet another API ;) EDIT: fix quoting |
Absolutely. It does feel to me that the problem is there is no generic way for a type to expose its allocated buffer in such a way that a consuming type can take over that allocation. Instead the |
I doubt that we can expose this as an API anytime soon or generically. In practice it only works on a few (although very important) types, what exactly works changes over time, relies on some unstable and very unsafe features and can't be implemented by user crates. So it's not something that really makes sense for This will have to remain in the realm of non-guaranteed optimizations for a while. A |
It was |
document `FromIterator for Vec` allocation behaviors [t-libs discussion](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202024-01-24/near/417686526) about rust-lang#120091 didn't reach a strong consensus, but it was agreed that if we keep the current behavior it should at least be documented even though it is an implementation detail. The language is intentionally non-committal. The previous (non-existent) documentation permits a lot of implementation leeway and we want retain that. In some cases we even must retain it to be able to rip out some code paths that rely on unstable features.
document `FromIterator for Vec` allocation behaviors [t-libs discussion](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202024-01-24/near/417686526) about rust-lang#120091 didn't reach a strong consensus, but it was agreed that if we keep the current behavior it should at least be documented even though it is an implementation detail. The language is intentionally non-committal. The previous (non-existent) documentation permits a lot of implementation leeway and we want retain that. In some cases we even must retain it to be able to rip out some code paths that rely on unstable features.
document `FromIterator for Vec` allocation behaviors [t-libs discussion](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202024-01-24/near/417686526) about rust-lang#120091 didn't reach a strong consensus, but it was agreed that if we keep the current behavior it should at least be documented even though it is an implementation detail. The language is intentionally non-committal. The previous (non-existent) documentation permits a lot of implementation leeway and we want retain that. In some cases we even must retain it to be able to rip out some code paths that rely on unstable features.
Rollup merge of rust-lang#120355 - the8472:doc-vec-fromiter, r=cuviper document `FromIterator for Vec` allocation behaviors [t-libs discussion](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202024-01-24/near/417686526) about rust-lang#120091 didn't reach a strong consensus, but it was agreed that if we keep the current behavior it should at least be documented even though it is an implementation detail. The language is intentionally non-committal. The previous (non-existent) documentation permits a lot of implementation leeway and we want retain that. In some cases we even must retain it to be able to rip out some code paths that rely on unstable features.
This was discussed in a libs meeting but no conclusion on whether the range of possible behaviors should be narrowed was reached. We did agree that it should at least be documented (#120355) and the recent changes warrant a release note (#120004). Additionally #120116 removes the cases in which the attempted optimization would never have been useful on stable and that also caused the regression that lead to the blog post. |
Documentation has been added, the case where the optimization wasn't effective has been removed and there have been no further reports so I'll close this for now. |
(This bug report was inspired by this blog post https://blog.polybdenum.com/2024/01/17/identifying-the-collect-vec-memory-leak-footgun.html)
After #110353 was landed, in-place iteration can reuse allocations in many more places. While this is a good thing, in some cases this can cause overly large capacity for the destination vector.
Additionally, in some this will cause the destination vector to have a very large capacity even for non-shared allocations. In my opinion, this should never happen.
For an example see this code:
On stable this code works as expected, i.e. two different vectors with reasonably lengths and capacities.
On beta however,
v2
will have a capacity of 262144, even though it does not share an allocation withv1
. If you remove theas u8
part, then the allocations will be shared, but the capacity is still overly large.My suggested fix is:
Meta
I am running NixOS with rustup.
The text was updated successfully, but these errors were encountered: