-
Notifications
You must be signed in to change notification settings - Fork 24
Arena allocation using bumpalo (and hashbrown with custom alloc). #115
Conversation
This PR modifies both linear scan and BT allocators (and their shared analyses) to pervasively use `bumpalo` for memory allocation from an arena that lives for the duration of a function's register allocation. This should result in significant performance improvements (I have not yet measured, though), as we have seen malloc/free to be significant parts of CPU-time profiles in the past. Care must be taken to *not* use normal heap-allocated types inside of a `BumpVec`, `BumpMap` or `BumpSet`, because the arena does not invoke `Drop` impls when it is finally discarded (this is what allows freeing the memory to be so fast; we just discard the entire block at once). As such, many uses of `SmallVec` and some other types had to change. Some cases that I had missed were picked up by LeakSanitizer in fuzzing runs, and these errors no longer occur, so I'm relatively satisfied that we've covered them all. This changes the public interface -- the caller must provide the arena, and some of the returned metadata is allocated within it -- so it will require corresponding updates in e.g. Cranelift. That will come in subsequent PRs. Thanks to @bnjbvr for noting in our discussion this morning that memory allocation is still significant in profiles and thus inspiring me to dive into this :-)
Also, note that I had to make a slight tweak to rust-lang/hashbrown to expose one type; I'll create a separate PR over there. We'll need that crate to do a point release that we can refer to before this PR merges. |
Export AllocError as well as Allocator. This PR re-exports the `AllocError` type alongside `Allocator` and others from the `raw` submodule. This seems to be necessary to implement the `Allocator` trait from outside the crate (and I want to use exported types from this crate so that I don't have to depend on the unstable allocator API directly). It allowed me to instantiate hashbrown hashmaps using `bumpalo` over in bytecodealliance/regalloc.rs#115; I couldn't work out another way to do this. If there is a more direct way, please do let me know! (Thanks for this excellent crate and the configurable allocator interface, in any case!)
I just updated to use FxHashMap again, after some discussion with Julian and Ben; Ben's initial benchmark showed this to be slower than expected but we think that having reverted to the default hasher could explain it. Hopefully this one's faster! |
Added a @bnjbvr, |
It might be worth simply using |
Indeed, that's what we had before that change; I was curious whether the pointer indirection / reduced locality of placing data on the heap always (vs. keeping it on-stack when small) might be causing a performance loss. I just ran some tests (after pushing the update) and it seems it doesn't make too much of a difference either way. @bnjbvr, a I suspect that if we put some thought into a hierarchical scheme of some sort -- subarenas within the arena -- maybe we could get some of that back. There's no reason we shouldn't be able to get a lot of benefit from bump-allocating e.g. all of the register mention lists and range/fragment lists; maybe there are smaller temporary objects that we should be freeing earlier? |
Cleaning up old PRs and closing this now. |
This PR modifies both linear scan and BT allocators (and their shared
analyses) to pervasively use
bumpalo
for memory allocation from anarena that lives for the duration of a function's register allocation.
This should result in significant performance improvements (I have not
yet measured, though), as we have seen malloc/free to be significant
parts of CPU-time profiles in the past.
Care must be taken to not use normal heap-allocated types inside of a
BumpVec
,BumpMap
orBumpSet
, because the arena does not invokeDrop
impls when it is finally discarded (this is what allows freeingthe memory to be so fast; we just discard the entire block at once). As
such, many uses of
SmallVec
and some other types had to change. Somecases that I had missed were picked up by LeakSanitizer in fuzzing runs,
and these errors no longer occur, so I'm relatively satisfied that we've
covered them all.
This changes the public interface -- the caller must provide the arena,
and some of the returned metadata is allocated within it -- so it will
require corresponding updates in e.g. Cranelift. That will come in
subsequent PRs.
Thanks to @bnjbvr for noting in our discussion this morning that memory
allocation is still significant in profiles and thus inspiring me to
dive into this :-)