-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC unbox more immutables #18632
Conversation
Awesome!! Probably a predictable request: is it possible to have the part that's just an optimization (stack allocation) first, without changing the layout of anything? That could probably be merged very quickly. How does stack marking work in the gc? |
Yeah I'm just afraid that it'll make things worse since everytime the immutable will go in/out of local scope it'll have to be unboxed/boxed so we may end up making more boxes than today. The gc objects on the stack have a pointer to them in the gc frame and the special treatment in gc is done by checking if they are inside the task stack's bounds. |
Am I right that this will dramatically improve the performance of |
// VecElement types are unwrapped in LLVM. | ||
addr = strct.V; | ||
else | ||
assert(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That assert should probably go away?
My argument for opt-in is also (local) predictability. The question is that for given types |
Healthy bump for 0.6 timeline; I think we'd all love to see this get in. |
I would have liked to see this in 0.6 since I think this is a very valuable optimisation especially for It would be awesome to have this early on for the next release cycle! |
This is actually independent of stack allocation of unescaped |
Just thought I'd drop by to point out that this PR won't alter |
Yeah sorry mixed the two things up in my head, since this includes a version of d189cb3 with |
This is an optimization and therefore not release blocking. Realistically, this PR is not going to be merged in this form, so we may as well close it and take it off the 1.0 milestone. |
Is there a high level issue that tracks progress on this general theme? Would be good to have something open that refers to this optimization. |
cc @Keno |
Would be nice to keep this open to make it easier to find, since while it won't be merged, we are likely to take many pieces from it. |
Why don't you make a "high level issue that tracks progress on this general theme" instead and link to this PR from there along with all the other relevant PRs? |
@vtjnash are you ready to close this issue now that you've implemented many of the pieces? |
The codegen/gc part of this is basically working.
I'm now wondering about semantics and I'd like us to discuss the following issues a bit before I clean up the code and we start the review (there is a bunch of duplicate paths in codegen that can be merged together/simplified and some things are plain wrong and/or inefficient).
This patch allows us to unbox most immutables. By unbox I mean : allocate/store them on the stack, inline them in other objects and inline them in arrays.
Why most ? There are (for now) two problems : cycles and #undef.
Cycles are a fundamental problem, if A has a field of type B and B of type A, we obviously can't inline them into each other. The cycle needs to be broken, the annoying part is that it should be done in a predictable way. For now, on this PR, it's done in DFS order which means that for example the layout of B will differ if we ever instantiated an A before. Not good. Proposal I remember about that (Jameson @ juliacon iirc) was to make types boxed iff they are part of any field cycle.
#undef
is annoying because it makes a difference at the julia level betweenisbits
types and other immutables. To minimize breakage I've gone the route of preserving the current behavior.So if A has a pointer field and we make, e.g., an uninitialized array of A, this branch uses the nullity of the field of A as a marker that the corresponding slot in the array is #undef. This only works if the field of a valid instance of A can never be null, i.e., if
A.ninitialized >= field_index_of_the_ptr_field
.This makes most code (at least all the test suite :-)) work but I think the following rules are really weird :
A type
T
will be inlined into fields/arrays and stack allocated ifThe only difference between a type that is boxed or not is memory layout, but I'd assume that we want that to be easily predictable since for example people routinely interface with C.
A proposed alternative by Yichao was to make it entierly opt-in and error out if inlining was not possible. I'm worried this will lead to yet-another-annotation that people will sprinkle everywhere.
For performance, specially crafted tests (like summing lines of a very skinny matrix using subarrays) show some improvements by avoiding gc allocation. Not super satisfying for now and casual inspection of generated asm shows a lot of stack movement. We can work on that though, probably by improving llvm's visibility of our rooting mechanism and/or just using statepoints.
(to sweeten the deal I've thrown in improved undef ref errors)