Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layout optimization for Result<&T, E>-like types #48741

Closed
glandium opened this issue Mar 4, 2018 · 5 comments
Closed

Layout optimization for Result<&T, E>-like types #48741

glandium opened this issue Mar 4, 2018 · 5 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR.

Comments

@glandium
Copy link
Contributor

glandium commented Mar 4, 2018

One realization is that the hot case for Result<T, E> is usually the Ok() case. And in many cases, T is actually some sort of NonNull pointer: a ref, a Box, etc.

The current layout for Result<NonNull<T>, E> is (tag, union { NonNull, E }). Which means either way, the code needs to read the tag, and then read the union.

If instead, the layout was (union { tag, NonNull }, E), then the common case becomes one read.

The generalization could be formulated like this: When the tag is a boolean, and the first variant is a NonNull/NonZero type, the first variant is stored in place of the tag, and the invalid zero value acts as tag for the second variant.

So Ok(value) would be (value, undefined), and Err(e) would be (0, e).

Some code to show the benefits of this optimization:

#![feature(nonzero)]
extern crate core;
use core::nonzero::NonZero;

pub struct Foo(usize, usize);

impl Foo {
    fn as_result(&self) -> Result<NonZero<usize>, usize> {
        if self.0 > 0 {
            Ok(unsafe { NonZero::new_unchecked(self.0) })
        } else {
            Err(self.1)
        }
    }
}

pub fn foo(f: &Foo) -> Option<NonZero<usize>> {
    f.as_result().ok()
}

pub fn foo_unwrap(f: &Foo) -> usize {
    f.as_result().unwrap().get()
}

pub fn bar(f: &Result<NonZero<usize>, usize>) -> Option<NonZero<usize>> {
    f.ok()
}

pub fn bar_unwrap(f: &Result<NonZero<usize>, usize>) -> usize {
    f.unwrap().get()
}

Compiled as the following with godbolt:

example::foo:
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rdi]
  pop rbp
  ret

example::foo_unwrap:
  mov rax, qword ptr [rdi]
  test rax, rax
  jne .LBB4_2
  mov rax, qword ptr [rdi + 8]
.LBB4_2:
  je .LBB4_3
  ret
.LBB4_3:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2

example::bar:
  push rbp
  mov rbp, rsp
  cmp qword ptr [rdi], 1
  je .LBB5_1
  mov rax, qword ptr [rdi + 8]
  pop rbp
  ret
.LBB5_1:
  xor eax, eax
  pop rbp
  ret

example::bar_unwrap:
  mov rax, qword ptr [rdi + 8]
  cmp qword ptr [rdi], 1
  je .LBB6_1
  ret
.LBB6_1:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2

This doesn't really remove branches in the example above, but removes the need to read memory in the common case (although, the data is probably in the same cache-line, or in the next pre-fetched one, but that's still less instructions to execute). In some cases, I've seen the compiler use cmov instead of a branch, though.

Note the compiler does a poor job with foo_unwrap, for some reason... manually inlining as_result() makes it generate better code.

This could be applied to slices too, where Result<&[T], E> could become (union { tag, slice-ptr }, union { slice-size, E}), in which case this would even make the type smaller than (tag, union { (slice-ptr, slice-size), E }).

@nagisa
Copy link
Member

nagisa commented Mar 5, 2018

This has a trade-off of potentially having up to double of space usage, which defeats the purpose of this being an enum.

@SimonSapin
Copy link
Contributor

@nagisa This layout could be chosen only when it doesn’t increase the size, right? (For example when E is pointer-sized.)

@glandium
Copy link
Contributor Author

glandium commented Mar 5, 2018

This has a trade-off of potentially having up to double of space usage

I guess you're saying this because I didn't account for types other than refs/pointers and slices. But if you think about this as making the common variant stored at the top of the layout without a tag, and the uncommon variant like currently, with the "tag" being zero, the worst case is the same size as now.
This wouldn't kick in unless the first word in the common variant is a NonZero.

@TimNN TimNN added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Mar 6, 2018
@nox
Copy link
Contributor

nox commented Mar 20, 2018

@eddyb
Copy link
Member

eddyb commented Mar 20, 2018

Closing as duplicate of #46213 (comment).

@eddyb eddyb closed this as completed Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR.
Projects
None yet
Development

No branches or pull requests

6 participants