Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove most #[inline] annotations #119

Merged
merged 4 commits into from
Oct 15, 2019
Merged

Commits on Oct 9, 2019

  1. Only rerun build script if build script changes

    Avoids unnecessary rebuilds when locally developing the crate.
    alexcrichton committed Oct 9, 2019
    Configuration menu
    Copy the full SHA
    7a548a5 View commit details
    Browse the repository at this point in the history
  2. Give modules native name instead of imp

    Helps when debugging and looking at symbols to see what we got.
    alexcrichton committed Oct 9, 2019
    Configuration menu
    Copy the full SHA
    94ca780 View commit details
    Browse the repository at this point in the history
  3. Remove most #[inline] annotations

    This commit goes through and deletes almost all `#[inline]` annotations
    in this crate. It looks like before this commit basically every single
    function is `#[inline]`, but this is generally not necessary for
    performance and can have a severe impact on compile times in both debug
    and release modes, most severely in release mode.
    
    Some `#[inline]` annotations are definitely necessary, however. Most
    functions in this crate are already candidates for inlining because
    they're generic, but functions like `Group` and `BitMask` aren't
    candidates for inlining without `#[inline]`. Additionally LLVM is by no
    means perfect, so some `#[inline]` may still be necessary to get some
    further speedups.
    
    The procedure used to generate this commit looked like:
    
    * Remove all `#[inline]` annotations.
    * Run `cargo bench`, comparing against the `master` branch, and add
      `#[inline]` to hot spots as necessary.
    * A [PR] was made against rust-lang/rust to [evaluate the impact][run1]
      on the compiler for more performance data.
    * Using this data, `perf diff` was used locally to determine further hot
      spots and more `#[inline]` annotations were added.
    * A [second round of benchmarking][run2] was done
    
    The numbers are at the point where I think this should land in the crate
    and get published to move into the standard library. There are up to 20%
    wins in compile time for hashmap-heavy crates (like Cargo) and milder
    wins (up to 10%) for a number of other large crates. The regressions are
    all in the 1-3% range and are largely on benchmarks taking a few handful
    of milliseconds anyway, which I'd personally say is a worthwhile
    tradeoff.
    
    For comparison, the benchmarks of this crate before and after this
    commit look like so:
    
       name                         baseline ns/iter  new ns/iter  diff ns/iter   diff %  speedup
       insert_ahash_highbits        7,137             9,044               1,907   26.72%   x 0.79
       insert_ahash_random          7,575             9,789               2,214   29.23%   x 0.77
       insert_ahash_serial          9,833             9,476                -357   -3.63%   x 1.04
       insert_erase_ahash_highbits  15,824            19,164              3,340   21.11%   x 0.83
       insert_erase_ahash_random    16,933            20,353              3,420   20.20%   x 0.83
       insert_erase_ahash_serial    20,857            27,675              6,818   32.69%   x 0.75
       insert_erase_std_highbits    35,117            38,385              3,268    9.31%   x 0.91
       insert_erase_std_random      35,357            37,236              1,879    5.31%   x 0.95
       insert_erase_std_serial      30,617            34,136              3,519   11.49%   x 0.90
       insert_std_highbits          15,675            18,180              2,505   15.98%   x 0.86
       insert_std_random            16,566            17,803              1,237    7.47%   x 0.93
       insert_std_serial            14,612            16,025              1,413    9.67%   x 0.91
       iter_ahash_highbits          1,715             1,640                 -75   -4.37%   x 1.05
       iter_ahash_random            1,721             1,634                 -87   -5.06%   x 1.05
       iter_ahash_serial            1,723             1,636                 -87   -5.05%   x 1.05
       iter_std_highbits            1,715             1,634                 -81   -4.72%   x 1.05
       iter_std_random              1,715             1,637                 -78   -4.55%   x 1.05
       iter_std_serial              1,722             1,637                 -85   -4.94%   x 1.05
       lookup_ahash_highbits        4,565             5,809               1,244   27.25%   x 0.79
       lookup_ahash_random          4,632             4,047                -585  -12.63%   x 1.14
       lookup_ahash_serial          4,612             4,906                 294    6.37%   x 0.94
       lookup_fail_ahash_highbits   4,206             3,976                -230   -5.47%   x 1.06
       lookup_fail_ahash_random     4,327             4,211                -116   -2.68%   x 1.03
       lookup_fail_ahash_serial     8,999             4,386              -4,613  -51.26%   x 2.05
       lookup_fail_std_highbits     13,284            13,342                 58    0.44%   x 1.00
       lookup_fail_std_random       13,172            13,614                442    3.36%   x 0.97
       lookup_fail_std_serial       11,240            11,539                299    2.66%   x 0.97
       lookup_std_highbits          13,075            13,333                258    1.97%   x 0.98
       lookup_std_random            13,257            13,193                -64   -0.48%   x 1.00
       lookup_std_serial            10,782            10,917                135    1.25%   x 0.99
    
    The summary of this from what I can tell is that the microbenchmarks are
    sort of all over the place, but they're neither consistently regressing
    nor improving, as expected. In general I would be surprised if there's
    much of a significant performance regression attributed to this commit,
    and `#[inline]` can always be selectively added back in easily without
    adding it to every function in the crate.
    
    [PR]: rust-lang/rust#64846
    [run1]: rust-lang/rust#64846 (comment)
    [run2]: rust-lang/rust#64846 (comment)
    alexcrichton committed Oct 9, 2019
    Configuration menu
    Copy the full SHA
    6d20b6b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4e9e27d View commit details
    Browse the repository at this point in the history