-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layout optimization for Result<&T, E>-like types #48741
Comments
This has a trade-off of potentially having up to double of space usage, which defeats the purpose of this being an enum. |
@nagisa This layout could be chosen only when it doesn’t increase the size, right? (For example when |
I guess you're saying this because I didn't account for types other than refs/pointers and slices. But if you think about this as making the common variant stored at the top of the layout without a tag, and the uncommon variant like currently, with the "tag" being zero, the worst case is the same size as now. |
Closing as duplicate of #46213 (comment). |
One realization is that the hot case for
Result<T, E>
is usually theOk()
case. And in many cases,T
is actually some sort of NonNull pointer: a ref, a Box, etc.The current layout for
Result<NonNull<T>, E>
is (tag, union { NonNull, E }). Which means either way, the code needs to read the tag, and then read the union.If instead, the layout was (union { tag, NonNull }, E), then the common case becomes one read.
The generalization could be formulated like this: When the tag is a boolean, and the first variant is a NonNull/NonZero type, the first variant is stored in place of the tag, and the invalid zero value acts as tag for the second variant.
So
Ok(value)
would be (value, undefined), andErr(e)
would be (0, e).Some code to show the benefits of this optimization:
Compiled as the following with godbolt:
This doesn't really remove branches in the example above, but removes the need to read memory in the common case (although, the data is probably in the same cache-line, or in the next pre-fetched one, but that's still less instructions to execute). In some cases, I've seen the compiler use cmov instead of a branch, though.
Note the compiler does a poor job with
foo_unwrap
, for some reason... manually inlining as_result() makes it generate better code.This could be applied to slices too, where Result<&[T], E> could become (union { tag, slice-ptr }, union { slice-size, E}), in which case this would even make the type smaller than (tag, union { (slice-ptr, slice-size), E }).
The text was updated successfully, but these errors were encountered: