Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Avoid ptr::copy if unnecessary in vec::Drain #50575

Merged
merged 1 commit into from
May 11, 2018

Conversation

alexcrichton
Copy link
Member

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the expand_statements
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

  • The function was "fast" on glibc 2.24 and slow on glibc 2.23
  • The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of expand_statements was Drop for Drain in the call to
splice where we insert new statements into the original vector. This should
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
memmove where the src/dst are equal becomes much faster. This program
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing memmove if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to Drop for Drain to avoid the call to ptr::copy if the region being
copied doesn't actually need to be copied. For normal usage of just Drain
itself this check isn't really necessary, but because Splice internally
contains Drain this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where memmove looks to not have this optimization.

Note that the way splice was called in expand_statements would cause a
quadratic number of elements to be copied via memmove which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@rust-highfive
Copy link
Collaborator

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 9, 2018
@kennytm
Copy link
Member

kennytm commented May 9, 2018

@bors try

@Mark-Simulacrum could we do a perf check?

@bors
Copy link
Contributor

bors commented May 9, 2018

⌛ Trying commit 254b601 with merge 1aac55d...

bors added a commit that referenced this pull request May 9, 2018
std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes #50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
@kennytm kennytm added S-waiting-on-perf Status: Waiting on a perf run to be completed. beta-nominated Nominated for backporting to the compiler in the beta channel. labels May 9, 2018
@kennytm
Copy link
Member

kennytm commented May 9, 2018

Also beta-nominating this for 1.27.

@Mark-Simulacrum
Copy link
Member

Perf has been queued.

@sfackler
Copy link
Member

sfackler commented May 9, 2018

This seems like a reasonable thing to do even if it doesn't fix the perf regression. r=me

@bors
Copy link
Contributor

bors commented May 9, 2018

☀️ Test successful - status-travis
State: approved= try=True

@alexcrichton
Copy link
Member Author

FWIW locally on my computer (glibc 2.23)

$ time rustc +stable main.rs
rustc +stable main.rs  4.94s user 0.18s system 100% cpu 5.080 total
$ time rustc +beta main.rs
rustc +beta main.rs  5.00s user 0.09s system 98% cpu 5.185 total
$ time rustc +nightly main.rs
^C
rustc +nightly main.rs  206.66s user 0.14s system 99% cpu 3:27.45 total
$ time rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs
rustc +1aac55d18084910bbfa1d25733a5393860616b8b main.rs  4.06s user 0.05s system 100% cpu 4.092 total

@alexcrichton
Copy link
Member Author

future perf results link

@alexcrichton
Copy link
Member Author

While the link I pasted above doesn't work yet this is the results comparing to the previous commit on master so either this PR or the rollup before it caused the speed boost, but I'm gonna optimistically say it was this PR :)

@bors: r=sfackler

@bors
Copy link
Contributor

bors commented May 9, 2018

📌 Commit 254b601 has been approved by sfackler

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 9, 2018
@kennytm kennytm removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 10, 2018
@alexcrichton
Copy link
Member Author

@bors: rollup

@alexcrichton alexcrichton added the beta-accepted Accepted for backporting to the compiler in the beta channel. label May 10, 2018
alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 10, 2018
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 10, 2018
…fackler

std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

This commit is spawned out of a performance regression investigation in rust-lang#50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:

* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc

Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.

The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.

It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.

And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.

Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.

Closes rust-lang#50496

[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
bors added a commit that referenced this pull request May 10, 2018
Rollup of 18 pull requests

Successful merges:

 - #49423 (Extend tests for RFC1598 (GAT))
 - #50010 (Give SliceIndex impls a test suite of girth befitting the implementation (and fix a UTF8 boundary check))
 - #50447 (Fix update-references for tests within subdirectories.)
 - #50514 (Pull in a wasm fix from LLVM upstream)
 - #50524 (Make DepGraph::previous_work_products immutable)
 - #50532 (Don't use Lock for heavily accessed CrateMetadata::cnum_map.)
 - #50538 ( Make CrateNum allocation more thread-safe. )
 - #50564 (Inline `Span` methods.)
 - #50565 (Use SmallVec for DepNodeIndex within dep_graph.)
 - #50569 (Allow for specifying a linker plugin for cross-language LTO)
 - #50572 (Clarify in the docs that `mul_add` is not always faster.)
 - #50574 (add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (#49022))
 - #50575 (std: Avoid `ptr::copy` if unnecessary in `vec::Drain`)
 - #50588 (Move "See also" disambiguation links for primitive types to top)
 - #50590 (Fix tuple struct field spans)
 - #50591 (Restore RawVec::reserve* documentation)
 - #50598 (Remove unnecessary mutable borrow and resizing in DepGraph::serialize)
 - #50606 (Retry when downloading the Docker cache.)

Failed merges:

 - #50161 (added missing implementation hint)
 - #50558 (Remove all reference to DepGraph::work_products)
@bors bors merged commit 254b601 into rust-lang:master May 11, 2018
bors added a commit that referenced this pull request May 11, 2018
[beta] Process backports

* #50575: std: Avoid `ptr::copy` if unnecessary in `vec::Drain`

r? @alexcrichton
@pietroalbini pietroalbini removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label May 11, 2018
@alexcrichton alexcrichton deleted the faster-drain-drop branch June 29, 2018 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta-accepted Accepted for backporting to the compiler in the beta channel. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Severe regression in html5ever build time
8 participants