Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup of 7 pull requests #88881

Merged
merged 39 commits into from
Sep 12, 2021
Merged

Rollup of 7 pull requests #88881

merged 39 commits into from
Sep 12, 2021

Conversation

Manishearth
Copy link
Member

Successful merges:

Failed merges:

r? @ghost
@rustbot modify labels: rollup

Create a similar rollup

Mark-Simulacrum and others added 30 commits September 6, 2021 13:17
This expands the API to be more flexible, allowing for more visitation patterns
on graphs. This will be useful to avoid extra datasets (and allocations) in
cases where the expanded DFS API is sufficient.

This also fixes a bug with the previous DFS constructor, which left the start
node not marked as visited (even though it was immediately returned).
They can be obtained by accessing the `TyCtxt` where they are needed.
Local variables can never be exported.
The order of the `where` bounds on auto trait impls changed because
rustdoc currently sorts auto trait `where` bounds based on the `Debug`
output for the bound. Now that the bounds have an actual `Res`, they are
being unintentionally sorted by their `DefId` rather than their path.
So, I had to update a test for the change in ordering of the rendered
bounds.
If the path is for a trait, it is always true that `trait_did ==
Some(did)`, so instead, `external_path()` now takes an `is_trait`
boolean.
…estebank

 Detect stricter constraints on gats where clauses in impls vs trait

I might try to see if I can do a bit more to improve these diagnostics, but any initial feedback is appreciated. I can also do any additional work in a followup PR.

r? `@estebank`
rustc: Remove local variable IDs from `Export`s

Local variables can never be exported.
…, r=pietroalbini

Remove extra unshallow from cherry-pick checker

This is already done by https://github.com/rust-lang/rust/blob/13db8440bbbe42870bc828d4ec3e965b38670277/src/ci/init_repo.sh#L32-L36 on the beta channel, and git throws an error if you attempt to unshallow an already non-shallow repository.

r? ```@pietroalbini```
generic_const_exprs: use thir for abstract consts instead of mir

Changes `AbstractConst` building to use `thir` instead of `mir` so that there's less chance of consts unifying when they shouldn't because lowering to mir dropped information (see `abstract-consts-as-cast-5.rs` test)

r? `@lcnr`
…h726

Rework DepthFirstSearch API

This expands the API to be more flexible, allowing for more visitation patterns
on graphs. This will be useful to avoid extra datasets (and allocations) in
cases where the expanded DFS API is sufficient.

This also fixes a bug with the previous DFS constructor, which left the start
node not marked as visited (even though it was immediately returned).

Commit written by ```@nikomatsakis``` originally, cherry picked from several commits in work on never type stabilization, but stands alone.
rustdoc: Cleanup `clean` part 1

Split out from rust-lang#88379.

These commits are completely independent of each other, and each is a fairly
small change (the last few are new commits; they are not from rust-lang#88379):

- Remove unnecessary `Cache.*_did` fields
- rustdoc: Get symbol for `TyParam` directly
- Create a valid `Res` in `external_path()`
- Remove unused `hir_id` parameter from `resolve_type`
- Fix redundant arguments in `external_path()`
- Remove unnecessary `is_trait` argument
- rustdoc: Cleanup a pattern match in `external_generic_args()`

r? ``@jyn514``
@rustbot rustbot added the rollup A PR which is a rollup label Sep 12, 2021
@Manishearth
Copy link
Member Author

@bors r+ p=1

@bors
Copy link
Contributor

bors commented Sep 12, 2021

📌 Commit 146aee6 has been approved by Manishearth

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 12, 2021
@bors
Copy link
Contributor

bors commented Sep 12, 2021

⌛ Testing commit 146aee6 with merge c7dbe7a...

@bors
Copy link
Contributor

bors commented Sep 12, 2021

☀️ Test successful - checks-actions
Approved by: Manishearth
Pushing c7dbe7a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 12, 2021
@bors bors merged commit c7dbe7a into rust-lang:master Sep 12, 2021
@rustbot rustbot added this to the 1.57.0 milestone Sep 12, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c7dbe7a): comparison url.

Summary: This change led to large relevant regressions 😿 in compiler performance.

  • Large regression in instruction counts (up to 2.1% on full builds of inflate)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@Manishearth
Copy link
Member Author

It feels like #88709 is the only likely culprit but it's also feature gated. Unsure what's going on here

@Manishearth
Copy link
Member Author

could also be #88677 or #88711 but everyone seems to find all of these candidates unlikely

@Mark-Simulacrum
Copy link
Member

Focusing in on the top regression, inflate check full, cachegrind shows the regression centering on process_obligations:

$ ./target/release/collector diff_local cachegrind +9ef27bf7dc50a8b51435579b4f2e86f7ee3f7a94 +c7dbe7a830100c70d59994fd940bf75bb6e39b39 --include inflate --builds Check --runs Full
$ less results/cgann-9ef27bf7dc50a8b51435579b4f2e86f7ee3f7a94-c7dbe7a830100c70d59994fd940bf75bb6e39b39-inflate-Check-Full
72,907,279  ???:rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations
23,931,248  ???:ena::unify::UnificationTable<S>::uninlined_get_root_key

From there, we want to check if the function is called more often or each call became more expensive. For this, callgrind is useful. Rerun the diff_local with callgrind:

$ ./target/release/collector diff_local callgrind +9ef27bf7dc50a8b51435579b4f2e86f7ee3f7a94 +c7dbe7a830100c70d59994fd940bf75bb6e39b39 --include inflate --builds Check --runs Full

Then, the default view doesn't show call counts, but you can pass --tree=caller to callgrind_annotate to get this:

$ callgrind_annotate --tree=caller results/clgout-9ef27bf7dc50a8b51435579b4f2e86f7ee3f7a94-inflate-Check-Full
2,960,080,594 (64.18%)  < ???:_ZN21rustc_trait_selection6traits7fulfill18FulfillmentContext6select17h80d1450895e486aeE.llvm.13260143977094445196 (34,057x)
2,220,443,052 (48.14%)  *  ???:rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations

  404,736,181 ( 8.78%)  < ???:rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations (8,826x)
  201,820,150 ( 4.38%)  *  ???:rustc_data_structures::obligation_forest::ObligationForest<O>::compress

$ callgrind_annotate --tree=caller results/clgout-c7dbe7a830100c70d59994fd940bf75bb6e39b39-inflate-Check-Full | less
3,056,791,553 (64.92%)  < ???:_ZN21rustc_trait_selection6traits7fulfill18FulfillmentContext6select17h80d1450895e486aeE.llvm.9518623118464054305 (34,057x)
2,293,350,331 (48.71%)  *  ???:rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations

  404,659,331 ( 8.59%)  < ???:rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations (8,826x)
  201,740,042 ( 4.28%)  *  ???:rustc_data_structures::obligation_forest::ObligationForest<O>::compress

We look at the numbers in (...) after the callers. In this case, they're the same, so each function got a little more expensive -- with thousands of calls, even a slight difference in instruction counts can blow up to a big change. No direct changes were made in the source code though.

There is an average of 2,840 more instructions per call (inclusive of functions called by process obligations), but it's not really clear what the cause is. A cursory inspection suggests that the new code has a larger stack allocation (up by 32 bytes), so presumably there's more load/spills... but the reason for the larger stack is unclear. It seems plausible that this is due to shifts in optimization choices made through PGO that in this case were "bad". Inflate is one of the benchmarks run in CI for PGO, but it's still possible for that to produce negative results. Reading the assembly to figure out what is causing the loads and spills without debuginfo doesn't seem worthwhile to invest into right now (and reproducing the build with debuginfo is probably hard).

My guess is that shifts elsewhere in the program slightly shifted the profile for this function or something along those lines and that caused this shift. It seems quite possible that no particular PR is actually fully responsible for this change.

@Mark-Simulacrum
Copy link
Member

FWIW while looking into this I also filed #89495 which may be of some help, but it's not a mitigation for this regression, just a sideline optimization.

@Mark-Simulacrum Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Oct 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. rollup A PR which is a rollup S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.