-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a faster allocation size check in slice::from_raw_parts #103287
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
Thanks! Looks good. This is only used in debug builds (of the stdlib) so no need for a perf run or restricting rollup (but no need to force it either, since maybe it could impact compile times for other unknown reasons). @bors r+ |
Use a faster allocation size check in slice::from_raw_parts I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation. Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW Before: ```asm example::len_check_old: mov rax, rdi mov ecx, 3 mul rcx setno cl test rax, rax setns al and al, cl ret example::len_check_old: mov rax, rdi mov ecx, 8 mul rcx setno cl test rax, rax setns al and al, cl ret ``` After: ```asm example::len_check_new: movabs rax, 3074457345618258603 cmp rdi, rax setb al ret example::len_check_new: shr rdi, 60 sete al ret ``` Running rustc-perf locally, this looks like up to a 4.5% improvement when `debug-assertions-std = true`. Thanks `@LegionMammal978` (I think that's you?) for turning my idea into a much cleaner implementation. r? `@thomcc`
Use a faster allocation size check in slice::from_raw_parts I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation. Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW Before: ```asm example::len_check_old: mov rax, rdi mov ecx, 3 mul rcx setno cl test rax, rax setns al and al, cl ret example::len_check_old: mov rax, rdi mov ecx, 8 mul rcx setno cl test rax, rax setns al and al, cl ret ``` After: ```asm example::len_check_new: movabs rax, 3074457345618258603 cmp rdi, rax setb al ret example::len_check_new: shr rdi, 60 sete al ret ``` Running rustc-perf locally, this looks like up to a 4.5% improvement when `debug-assertions-std = true`. Thanks ``@LegionMammal978`` (I think that's you?) for turning my idea into a much cleaner implementation. r? ``@thomcc``
Rollup of 10 pull requests Successful merges: - rust-lang#102951 (suggest type annotation for local statement initialed by ref expression) - rust-lang#103209 (Diagnostic derives: allow specifying multiple alternative suggestions) - rust-lang#103287 (Use a faster allocation size check in slice::from_raw_parts) - rust-lang#103416 (Name the `impl Trait` in region bound suggestions) - rust-lang#103430 (Workaround unstable stmt_expr_attributes for method receiver expressions) - rust-lang#103444 (Remove extra type error after missing semicolon error) - rust-lang#103520 (rustc_middle: Rearrange resolver outputs structures slightly) - rust-lang#103533 (Use &self instead of &mut self for cast methods) - rust-lang#103536 (Remove `rustc_driver::set_sigpipe_handler()`) - rust-lang#103542 (Pinning tests for some `macro_rules!` errors discussed in the lang meeting) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
600: Pull changes from upstream `master` r=kirtchev-adacore a=pietroalbini * rust-lang/rust#103605 * rust-lang/rust#103604 * rust-lang/rust#103598 * rust-lang/rust#103596 * rust-lang/rust#103580 * rust-lang/rust#103579 * rust-lang/rust#103567 * rust-lang/rust#103558 * rust-lang/rust#103549 * rust-lang/rust#103537 * rust-lang/rust#103526 * rust-lang/rust#103432 * rust-lang/rust#103571 * rust-lang/rust#103492 * rust-lang/rust#103572 * rust-lang/rust#103554 * rust-lang/rust#103546 * rust-lang/rust#103543 * rust-lang/rust#103428 * rust-lang/rust#102706 * rust-lang/rust#95710 * rust-lang/rust#103284 * rust-lang/rust#103562 * rust-lang/rust#103542 * rust-lang/rust#103536 * rust-lang/rust#103533 * rust-lang/rust#103520 * rust-lang/rust#103444 * rust-lang/rust#103430 * rust-lang/rust#103416 * rust-lang/rust#103287 * rust-lang/rust#103209 * rust-lang/rust#102951 * rust-lang/rust#103279 * rust-lang/rust#103158 Co-authored-by: Lukas Wirth <[email protected]> Co-authored-by: Pietro Albini <[email protected]> Co-authored-by: DropDemBits <[email protected]> Co-authored-by: bors <[email protected]> Co-authored-by: Pietro Albini <[email protected]>
I've been perusing through the codegen changes that result from turning on the standard library debug assertions. The previous check in here uses saturating arithmetic, which in my experience sometimes makes LLVM just fail to optimize things around the saturating operation.
Here is a demo of the codegen difference: https://godbolt.org/z/WMEqrjajW
Before:
After:
Running rustc-perf locally, this looks like up to a 4.5% improvement when
debug-assertions-std = true
.Thanks @LegionMammal978 (I think that's you?) for turning my idea into a much cleaner implementation.
r? @thomcc