Skip to content

Remove query function arrays#153114

Merged
rust-bors[bot] merged 5 commits intorust-lang:mainfrom
nnethercote:rm-query-arrays
Mar 1, 2026
Merged

Remove query function arrays#153114
rust-bors[bot] merged 5 commits intorust-lang:mainfrom
nnethercote:rm-query-arrays

Conversation

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Feb 26, 2026

View all comments

define_queries! produces four arrays of function pointers, which other functions iterate over. These aren't actually necessary.

r? @petrochenkov

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 26, 2026
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Feb 26, 2026
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 26, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 26, 2026

☀️ Try build successful (CI)
Build commit: f8df332 (f8df332a248f146a91b1023c5961dff7147fc3f3, parent: 1ed488274bec5bf5cfe6bf7a1cc089abcc4ebd68)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f8df332): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -4.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.1% [-4.1%, -4.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.1% [-4.1%, -4.1%] 1

Cycles

Results (primary 3.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.7% [3.0%, 4.4%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.7% [3.0%, 4.4%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 492.161s -> 478.631s (-2.75%)
Artifact size: 395.78 MiB -> 397.45 MiB (0.42%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 26, 2026
@nnethercote
Copy link
Contributor Author

Here's an explanation. Currently, the pattern is...

  • We have a hand-written function, v_inner, that does something query-related.
  • We generate one function per query (each one in its own module) that simply calls v_inner with some query-specific data (e.g. vtable).
  • We generate an array, V, with elements that point to all of these generated functions.
  • We have a hand-written top-level function, v_all, that iterates over the array V and calls its elements, one by one.

A simplified code representation:

// Hand-written
fn v_inner(s: &str) { ... }
                                                       
// Generated by the macro                              
mod q1 { fn v() { v_inner("q1"); } }
mod q2 { fn v() { v_inner("q2"); } }
mod q3 { fn v() { v_inner("q3"); } }
                                                       
// Generated by the macro
const V: &[q1::v, q2::v, q3::v];
                                                       
// Hand-written
fn v_all() {
    for v in V.iter() {
        v();
    }
}

After this PR, the pattern is...

  • We have a hand-written function, v, that does something query-related.
  • We generate a top-level function, v_all, that calls v for each query.

In code form:

// Hand-written
fn v(s: &str) { ... }
                                                       
// Generated by the macro
fn v_all() {
    v("q1");
    v("q2");
    v("q3");
}

Much nicer.

@nnethercote
Copy link
Contributor Author

Perf effects are neutral for icounts, as I'd expect.

Bootstrap numbers are interesting.

  • A 13.5s (-2.75%) time reduction(!)
  • A 1.67MB artifact size increase, but libLLVM.so (which shouldn't be affected) has a 2.01MB increase while librustc_driver.so (which would be affected) has a 360KB decrease.

This PR does eliminate 4 x 320 = 1,280 small functions in the compiler, and also eliminates 4 x 320-element arrays containing pointers to those functions. So I can imagine it could reduce bootstrap times. @Kobzol, do you know how reliable the bootstrap measurements are? I feel like I've seen large variances in the libLLVM.so size lately.

@nnethercote nnethercote marked this pull request as ready for review February 26, 2026 06:20
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 26, 2026
@Kobzol
Copy link
Member

Kobzol commented Feb 26, 2026

Bootstrap numbers have been quite noisy in the past week for some reason, yeah :( So hard to say.

@panstromek
Copy link
Contributor

panstromek commented Feb 26, 2026

Most of the ~13s reduction is in rustc_query_impl, which spiked from 30s to 40s in #153066 (base of these perf results), so that looks like noise for the most part, except maybe those 3.5 additional seconds?

@nnethercote
Copy link
Contributor Author

rustc_query_impl is the crate affected by the change, so maybe at least some of the reduction is real.

@nnethercote
Copy link
Contributor Author

nnethercote commented Feb 26, 2026

For my local builds librustc_driver.so drops from 699,074,376 bytes to 698,699,856 bytes, a 374,520 byte reduction. This is pretty close to the librustc_driver.so reduction of 359.56 KiB seen on CI (although the before and after sizes are much larger in the local build). I don't want to conclude too much from this measurement, but it is supporting evidence that the compiler's code size has shrunk by some non-trivial amount.

) {
let _prof_timer = tcx.sess.prof.generic_activity("self_profile_alloc_query_strings");

let mut string_cache = QueryKeyStringCache::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the main drawback is that some pieces of code that previously lived outside of macros now live in a macro. Those pieces are mostly tiny and trivial though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about absolutely minimizing this by moving all the code outside the $( ...$name... )* repetition into a separate function outside the macro. But in each case that extra code is so small (at most 6 lines) that I figured it wasn't worth it. (Except for gather_active_jobs, which has the 019e247 precursor.)

let mut string_cache = QueryKeyStringCache::new();

$(
$crate::profiling_support::alloc_self_profile_query_strings_for_query_cache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential drawback is that all these hundreds of function calls can now potentially be inlined and bloat rustc_query_impl, but if the benchmarks don't show anything, then it's not an issue in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very strange for these very large functions to be inlined. We could add inline(never) but I don't think it's necessary. The benchmarks show, if anything, the compiler's generated code getting smaller.

@petrochenkov
Copy link
Contributor

r=me with nits addressed.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 26, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 26, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

And also `query_key_hash_verify` for each query. This is done by
generating `query_key_hash_verify_all` and having it do things more
directly.
Currently `gather_active_jobs` and `gather_active_jobs_inner` do some of
the work each. This commit changes things so that `gather_active_jobs`
is just a thin wrapper around `gather_active_jobs_inner`. This paves the
way for removing `gather_active_jobs` in the next commit.
And also `gather_active_jobs` for each query. This is done by generating
`collect_active_jobs_from_all_queries` and having it do things more
directly.
And also `alloc_self_profile_query_strings` for each query. This is done
by generating the top-level `alloc_self_profile_query_strings` and
having it do things more directly.
And also `encode_query_results` for each cacheable query. This is done
by generating `encode_all_query_results` and having it do things more
directly.
@rustbot
Copy link
Collaborator

rustbot commented Feb 26, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@nnethercote
Copy link
Contributor Author

@bors r=petrochenkov

I will leave this as rollup=never because of the effects on bootstrap measurements.

@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 26, 2026

📌 Commit 90abede has been approved by petrochenkov

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 26, 2026
@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 1, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 1, 2026

☀️ Test successful - CI
Approved by: petrochenkov
Duration: 3h 8m 26s
Pushing 765fd2d to main...

@rust-bors rust-bors bot merged commit 765fd2d into rust-lang:main Mar 1, 2026
12 checks passed
@rustbot rustbot added this to the 1.96.0 milestone Mar 1, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing c2c6f74 (parent) -> 765fd2d (this PR)

Test differences

Show 4 test diffs

4 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 765fd2d8c77a570e7069d9f30bb6d3d8fe437f9e --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-llvm-mingw: 1h 30m -> 1h 48m (+20.8%)
  2. pr-check-1: 33m 22s -> 27m 32s (-17.4%)
  3. dist-aarch64-apple: 1h 52m -> 2h 12m (+17.4%)
  4. x86_64-gnu-debug: 2h 3m -> 1h 44m (-15.6%)
  5. aarch64-apple: 3h 24m -> 2h 52m (-15.3%)
  6. i686-gnu-2: 1h 44m -> 1h 29m (-14.4%)
  7. dist-aarch64-msvc: 1h 41m -> 1h 56m (+14.0%)
  8. i686-gnu-1: 2h 18m -> 2h (-13.0%)
  9. x86_64-gnu: 2h 19m -> 2h 2m (-12.2%)
  10. x86_64-rust-for-linux: 51m 30s -> 45m 34s (-11.5%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (765fd2d): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 3.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [3.5%, 3.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.5% [3.5%, 3.5%] 1

Cycles

Results (secondary 0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
7.0% [6.8%, 7.2%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-13.2% [-13.2%, -13.2%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 479.752s -> 477.793s (-0.41%)
Artifact size: 397.58 MiB -> 397.19 MiB (-0.10%)

@nnethercote nnethercote deleted the rm-query-arrays branch March 1, 2026 09:20
@nnethercote
Copy link
Contributor Author

Post-merge perf result shows

  • a 389KiB size reduction for librustc_driver.so
  • a 2 second bootstrap reduction, of which 1.5 seconds are in rustc_query_impl

I think both of these measurements are somewhere close to the truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants