Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually implement PartialEq for Option<T> and specialize non-nullable types #103556

Merged
merged 3 commits into from
Nov 26, 2022

Conversation

clubby789
Copy link
Contributor

@clubby789 clubby789 commented Oct 26, 2022

This PR manually implements PartialEq and StructuralPartialEq for Option, which seems to produce slightly better codegen than the automatically derived implementation.

It also allows specializing on the core::num::NonZero* and core::ptr::NonNull types, taking advantage of the niche optimization by transmuting the Option<T> to T to be compared directly, which can be done in just two instructions.

A comparison of the original, new and specialized code generation is available here.

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 26, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 26, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 26, 2022
@compiler-errors
Copy link
Member

perf run was requested @bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 26, 2022
@bors
Copy link
Contributor

bors commented Oct 26, 2022

⌛ Trying commit d0a58800564093f4b6db3ae0b5f76547c7564f4b with merge 05058ddce07865ec73eedb7c8d2cddd50d96d959...

Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty significant codegen win, although I'm surprised we don't get this already. Needs a fix for some UB, tho (fixing the UB in godbolt still produces the codegen win).

library/core/src/option.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@compiler-errors
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Oct 26, 2022

⌛ Trying commit a8140e59b80fae0c3b27e3a1d6e0b176dd5fb757 with merge 8032e517bc4d1e2309051ba99ef9c8beaff83a82...

@rust-log-analyzer

This comment has been minimized.

@clubby789
Copy link
Contributor Author

Made the NonNull test support ptr or i8* since CI was producing different results to my local build.

@compiler-errors
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Oct 26, 2022

⌛ Trying commit 5ed28fed2a0b927a267da146d51ec99bce8bc92f with merge a041a05c3184bb8c38b8940422e8951e99b6d3f1...

@Noratrieb
Copy link
Member

rustc internally uses rustc_scalar_valid attributes in its index macro. Would it make sense to apply this specialization for rustc indices as well? I don't think options are compared very often but it could be a win nevertheless.

@bors
Copy link
Contributor

bors commented Oct 26, 2022

☀️ Try build successful - checks-actions
Build commit: a041a05c3184bb8c38b8940422e8951e99b6d3f1 (a041a05c3184bb8c38b8940422e8951e99b6d3f1)

@rust-timer
Copy link
Collaborator

Queued a041a05c3184bb8c38b8940422e8951e99b6d3f1 with parent 6365e5a, future comparison URL.

@scottmcm
Copy link
Member

scottmcm commented Oct 26, 2022

I went to try to see if there's anything we could do to make LLVM understand this, and realized that right now we're shooting outselves in the foot: https://rust.godbolt.org/z/Ye5xr8P8x

What's PartialEq for NonZero doing right now? Well, apparently it's derived and whatever's going on with the derive it has no range information:

pub fn demo_std(x: &NonZeroU32, y: &NonZeroU32) -> bool {
    x == y
}
define noundef zeroext i1 @_ZN7example8demo_std17hcce6db1e74f1c1d4E(ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %0, ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %1) unnamed_addr #0 {
  %_9 = load i32, ptr %0, align 4
  %_10 = load i32, ptr %1, align 4
  %2 = icmp eq i32 %_9, %_10
  ret i1 %2
}

Whereas if you write the obvious implementation yourself

pub fn demo_obvious(x: &NonZeroU32, y: &NonZeroU32) -> bool {
    x.get() == y.get()
}

Then the loads get the !range metadata saying that it's nonzero

efine noundef zeroext i1 @_ZN7example12demo_obvious17haee70b6eb73f133dE(ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %x, ptr noalias nocapture noundef readonly align 4 dereferenceable(4) %y) unnamed_addr #0 {
  %self = load i32, ptr %x, align 4, !range !2, !noundef !3
  %self1 = load i32, ptr %y, align 4, !range !2, !noundef !3
  %0 = icmp eq i32 %self, %self1
  ret i1 %0
}

!2 = !{i32 1, i32 0}

It's possible that LLVM still might not be able to optimize this even with that for other reasons (#49572 (comment)), but I think we should at least find out whether giving LLVM the obvious information would be enough to let it make this transform -- it would be great if we could solve this in the NonZero code or in the rustc_layout_scalar_valid_range_start code and thus not need to specialize every use.

EDIT: Oh, if nikic already looked then there's probably no easy fix.

@scottmcm scottmcm mentioned this pull request Oct 26, 2022
@lukas-code
Copy link
Member

cc #49892

@clubby789 clubby789 force-pushed the specialize-option-partial-eq branch from b6b33c2 to a1b650c Compare October 27, 2022 12:46
@clubby789
Copy link
Contributor Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 29, 2022
@bors
Copy link
Contributor

bors commented Oct 31, 2022

☔ The latest upstream changes (presumably #103797) made this pull request unmergeable. Please resolve the merge conflicts.

@clubby789 clubby789 force-pushed the specialize-option-partial-eq branch from a1b650c to 20f2d8b Compare October 31, 2022 16:44
@scottmcm
Copy link
Member

Thanks! It's great that this worked out without adding any unsafe!

@bors r+

@scottmcm
Copy link
Member

Weird, let's try that again

@bors r+

@bors
Copy link
Contributor

bors commented Nov 26, 2022

📌 Commit b9a95d8 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 26, 2022
@bors
Copy link
Contributor

bors commented Nov 26, 2022

⌛ Testing commit b9a95d8 with merge 8841bee...

@bors
Copy link
Contributor

bors commented Nov 26, 2022

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing 8841bee to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 26, 2022
@bors bors merged commit 8841bee into rust-lang:master Nov 26, 2022
@rustbot rustbot added this to the 1.67.0 milestone Nov 26, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8841bee): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.4%, -0.2%] 2
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.3%] 2
All ❌✅ (primary) 0.5% [-0.4%, 2.1%] 3

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.0% [3.0%, 3.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.1% [-5.1%, -5.1%] 1
All ❌✅ (primary) 3.0% [3.0%, 3.0%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [1.8%, 1.8%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.8% [1.8%, 1.8%] 1

@rustbot rustbot added the perf-regression Performance regression. label Nov 26, 2022
@nnethercote
Copy link
Contributor

Perf changes are few, tiny, and not a concern.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Nov 27, 2022
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
…eq, r=scottmcm

Manually implement PartialEq for Option<T> and specialize non-nullable types

This PR manually implements `PartialEq` and `StructuralPartialEq` for `Option`, which seems to produce slightly better codegen than the automatically derived implementation.

It also allows specializing on the `core::num::NonZero*` and `core::ptr::NonNull` types, taking advantage of the niche optimization by transmuting the `Option<T>` to `T` to be compared directly, which can be done in just two instructions.

A comparison of the original, new and specialized code generation is available [here](https://godbolt.org/z/dE4jxdYsa).
@clubby789 clubby789 deleted the specialize-option-partial-eq branch February 11, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.