-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Substs::super_fold_with
.
#37108
Conversation
// and instead reuse the existing substs. | ||
// - In that case, when the substs length is one, we also avoid | ||
// creating a new Vec, which avoids a heap allocation. | ||
let params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params
is used only in 338 line block. Can you move into the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's used on lines 336, 338, 349.
// - In that case, when the substs length is one, we also avoid | ||
// creating a new Vec, which avoids a heap allocation. | ||
let params; | ||
if self.params.len() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rewrite it to match self.params.len()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer having it this way. The 1
case is most frequent, then >1
, and 0
is a distant last. With a match the ordering is less clear (i.e. not guaranteed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment to that effect, if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnethercote Okay. I got it.
return &self; | ||
} | ||
} else { | ||
return &self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor, but the &
in &self
is redundant. The Self
type is &'tcx Substs<'tcx>
and the self
identifier is of type &Self
. Deref coercion means that using &self
, self
or *self
here is going to be equivalent.
So this is an interesting take. I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a |
Oh and this doesn't handle |
My original version of the patch just checked if Then I added the skip-Vec-creation-in-the-length-1 case and that improved things a bit more. |
I haven't seen it in any of the profiles I've looked at. Can you describe in more detail how you think it might have an impact? |
☔ The latest upstream changes (presumably #37100) made this pull request unmergeable. Please resolve the merge conflicts. |
7209d3f
to
6310135
Compare
This speeds up several rustc-benchmarks by 1--4%.
6310135
to
1e4241a
Compare
I have changed the code to just do the "skip mk_substs if the fold was a no-op" optimisation. That should avoid any conflicts with @Mark-Simulacrum's changes to
r? @eddyb |
Note that it will be at least a few days, more likely a week until I can resume work on the |
let params: Vec<_> = self.iter().map(|k| k.fold_with(folder)).collect(); | ||
|
||
// If folding doesn't change the substs, it's faster to avoid calling | ||
// `mk_substs` and instead reuse the existing substs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.
@bors r+ |
📌 Commit 1e4241a has been approved by |
⌛ Testing commit 1e4241a with merge eeb6379... |
@bors: retry force clean
|
Just kidding I'm doing this only to unstuck @bors/homu/buildbot.
@bors r+ |
📌 Commit ab5dcff has been approved by |
… r=eddyb Optimize `Substs::super_fold_with`. This speeds up some of the rustc-benchmarks by up to ~4%. ``` futures-rs-test 4.467s vs 4.387s --> 1.018x faster (variance: 1.001x, 1.006x) helloworld 0.242s vs 0.246s --> 0.980x faster (variance: 1.007x, 1.013x) html5ever-2016- 7.664s vs 7.630s --> 1.004x faster (variance: 1.008x, 1.006x) hyper.0.5.0 5.218s vs 5.133s --> 1.016x faster (variance: 1.013x, 1.008x) inflate-0.1.0 5.040s vs 5.103s --> 0.988x faster (variance: 1.005x, 1.008x) issue-32062-equ 0.361s vs 0.345s --> 1.047x faster (variance: 1.008x, 1.019x) issue-32278-big 1.874s vs 1.850s --> 1.013x faster (variance: 1.020x, 1.018x) jld-day15-parse 1.569s vs 1.508s --> 1.040x faster (variance: 1.009x, 1.003x) piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x) regex.0.1.30 2.568s vs 2.555s --> 1.005x faster (variance: 1.018x, 1.044x) rust-encoding-0 2.139s vs 2.135s --> 1.001x faster (variance: 1.012x, 1.005x) syntex-0.42.2 33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x) syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x) ``` r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
Avoid more unnecessary mk_ty calls in Ty::super_fold_with. This speeds up several rustc-benchmarks by 1--5%. This PR is the lovechild of #37108 and #37705. ``` futures-rs-test 4.059s vs 4.011s --> 1.012x faster (variance: 1.016x, 1.026x) helloworld 0.236s vs 0.239s --> 0.986x faster (variance: 1.051x, 1.014x) html5ever-2016- 3.831s vs 3.824s --> 1.002x faster (variance: 1.020x, 1.019x) hyper.0.5.0 4.928s vs 4.936s --> 0.998x faster (variance: 1.003x, 1.012x) inflate-0.1.0 4.135s vs 4.104s --> 1.007x faster (variance: 1.026x, 1.028x) issue-32062-equ 0.309s vs 0.303s --> 1.017x faster (variance: 1.019x, 1.084x) issue-32278-big 1.818s vs 1.797s --> 1.011x faster (variance: 1.011x, 1.008x) jld-day15-parse 1.304s vs 1.271s --> 1.026x faster (variance: 1.018x, 1.012x) piston-image-0. 10.938s vs 10.921s --> 1.002x faster (variance: 1.025x, 1.016x) reddit-stress 2.327s vs 2.208s --> 1.054x faster (variance: 1.016x, 1.006x) regex-0.1.80 8.796s vs 8.727s --> 1.008x faster (variance: 1.012x, 1.019x) regex.0.1.30 2.294s vs 2.249s --> 1.020x faster (variance: 1.013x, 1.026x) rust-encoding-0 1.914s vs 1.886s --> 1.015x faster (variance: 1.027x, 1.026x) ```
This speeds up some of the rustc-benchmarks by up to ~4%.
r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.