Remove query function arrays#153114
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Remove query function arrays
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (f8df332): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -4.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 3.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 492.161s -> 478.631s (-2.75%) |
|
Here's an explanation. Currently, the pattern is...
A simplified code representation: After this PR, the pattern is...
In code form: Much nicer. |
|
Perf effects are neutral for icounts, as I'd expect. Bootstrap numbers are interesting.
This PR does eliminate 4 x 320 = 1,280 small functions in the compiler, and also eliminates 4 x 320-element arrays containing pointers to those functions. So I can imagine it could reduce bootstrap times. @Kobzol, do you know how reliable the bootstrap measurements are? I feel like I've seen large variances in the libLLVM.so size lately. |
|
Bootstrap numbers have been quite noisy in the past week for some reason, yeah :( So hard to say. |
|
Most of the ~13s reduction is in |
|
|
|
For my local builds |
c9101b5 to
0e1d4e7
Compare
| ) { | ||
| let _prof_timer = tcx.sess.prof.generic_activity("self_profile_alloc_query_strings"); | ||
|
|
||
| let mut string_cache = QueryKeyStringCache::new(); |
There was a problem hiding this comment.
It seems like the main drawback is that some pieces of code that previously lived outside of macros now live in a macro. Those pieces are mostly tiny and trivial though.
There was a problem hiding this comment.
I thought about absolutely minimizing this by moving all the code outside the $( ...$name... )* repetition into a separate function outside the macro. But in each case that extra code is so small (at most 6 lines) that I figured it wasn't worth it. (Except for gather_active_jobs, which has the 019e247 precursor.)
| let mut string_cache = QueryKeyStringCache::new(); | ||
|
|
||
| $( | ||
| $crate::profiling_support::alloc_self_profile_query_strings_for_query_cache( |
There was a problem hiding this comment.
Another potential drawback is that all these hundreds of function calls can now potentially be inlined and bloat rustc_query_impl, but if the benchmarks don't show anything, then it's not an issue in practice.
There was a problem hiding this comment.
It would be very strange for these very large functions to be inlined. We could add inline(never) but I don't think it's necessary. The benchmarks show, if anything, the compiler's generated code getting smaller.
|
r=me with nits addressed. |
|
Reminder, once the PR becomes ready for a review, use |
And also `query_key_hash_verify` for each query. This is done by generating `query_key_hash_verify_all` and having it do things more directly.
Currently `gather_active_jobs` and `gather_active_jobs_inner` do some of the work each. This commit changes things so that `gather_active_jobs` is just a thin wrapper around `gather_active_jobs_inner`. This paves the way for removing `gather_active_jobs` in the next commit.
And also `gather_active_jobs` for each query. This is done by generating `collect_active_jobs_from_all_queries` and having it do things more directly.
And also `alloc_self_profile_query_strings` for each query. This is done by generating the top-level `alloc_self_profile_query_strings` and having it do things more directly.
And also `encode_query_results` for each cacheable query. This is done by generating `encode_all_query_results` and having it do things more directly.
0e1d4e7 to
90abede
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@bors r=petrochenkov I will leave this as rollup=never because of the effects on bootstrap measurements. |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing c2c6f74 (parent) -> 765fd2d (this PR) Test differencesShow 4 test diffs4 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 765fd2d8c77a570e7069d9f30bb6d3d8fe437f9e --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (765fd2d): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 3.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 479.752s -> 477.793s (-0.41%) |
|
Post-merge perf result shows
I think both of these measurements are somewhere close to the truth. |
View all comments
define_queries!produces four arrays of function pointers, which other functions iterate over. These aren't actually necessary.r? @petrochenkov