[WIP] Test performance of running MIR inliner on inline(always) function calls when mir-opt-level=1 #110560

vlad20012 · 2023-04-19T20:55:09Z

It seems #105278 is stalled, so I'd like to perform several performance tests with different MIR inliner setups.

Let's start with just reading a callee MIR body without actually inlining it (in both incremental and non-incremental configurations).

rustbot · 2023-04-19T20:55:17Z

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-04-19T20:55:20Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

WaffleLapkin · 2023-04-19T21:37:40Z

@bors try @rust-timer queue

bors · 2023-04-19T21:37:50Z

⌛ Trying commit 796cafe with merge b47b7746514197f42ef70d9744fcbbea0256a508...

bors · 2023-04-19T23:19:28Z

☀️ Try build successful - checks-actions
Build commit: b47b7746514197f42ef70d9744fcbbea0256a508 (b47b7746514197f42ef70d9744fcbbea0256a508)

rust-timer · 2023-04-20T00:34:37Z

Finished benchmarking commit (b47b7746514197f42ef70d9744fcbbea0256a508): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.9%	[0.2%, 45.1%]	58
Regressions ❌ (secondary)	2.0%	[0.4%, 5.6%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.9%	[0.2%, 45.1%]	58

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	5.4%	[0.7%, 12.8%]	12
Regressions ❌ (secondary)	4.4%	[2.5%, 5.8%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.9%	[-3.9%, -3.9%]	1
All ❌✅ (primary)	5.4%	[0.7%, 12.8%]	12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	8.1%	[1.0%, 57.8%]	27
Regressions ❌ (secondary)	5.2%	[3.1%, 6.8%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	8.1%	[1.0%, 57.8%]	27

vlad20012 · 2023-04-20T11:24:54Z

⬆️ In that experiment I enabled Inlining pass for debug builds, but aborted any inlining attempt (in debug builds) right after check_mir_body invocation, so this is actually a measurement of infrastructure costs of inlining (i.e. calculating call graph, reading mir bodies, etc) without doing the inlining itself. Note that in this experiment I don't distinguish #[inline(always)], #[inline] functions or even functions without an #[inline] attribute.

debug incr-patched - up to 45% regression, 7.71% mean
debug full/incr-full - up to 5% regression, 2% mean
debug incr-unchanged - up to 3% regression, 1.5% mean

Not so bad for such a stressful experiment!

Let's now repeat the experiment, but early reject inline candidates without an #[inline(always)] attribute

oli-obk · 2023-04-20T11:26:57Z

@bors try @rust-timer queue

bors · 2023-04-20T11:27:07Z

⌛ Trying commit e65665f with merge f283ebe5c544c33161c94d8b60b2423af93b8148...

bors · 2023-04-20T13:08:21Z

☀️ Try build successful - checks-actions
Build commit: f283ebe5c544c33161c94d8b60b2423af93b8148 (f283ebe5c544c33161c94d8b60b2423af93b8148)

rust-timer · 2023-04-20T15:38:26Z

Finished benchmarking commit (f283ebe5c544c33161c94d8b60b2423af93b8148): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[0.5%, 4.4%]	15
Regressions ❌ (secondary)	1.5%	[0.3%, 2.6%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.5%	[-2.5%, -2.5%]	1
All ❌✅ (primary)	1.1%	[0.5%, 4.4%]	15

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[1.9%, 1.9%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.6%	[-2.3%, -0.4%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.0%	[-2.3%, 1.9%]	6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.3%	[1.1%, 5.1%]	8
Regressions ❌ (secondary)	3.5%	[3.4%, 3.5%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.3%	[1.1%, 5.1%]	8

vlad20012 · 2023-04-20T17:25:21Z

⬆️ In that experiment, I enabled the Inlining pass for debug builds, but considered for inlining only functions marked as #[inline(always)] and aborted any inlining attempt (in debug builds) right after check_mir_body invocation, so this is actually a measurement of infrastructure costs of inlining (i.e. calculating call graph, reading mir bodies, etc) without doing the inlining itself.
Note that it differs from the previous experiment in that it considers only #[inline(always)] function (while the previous experiment considered any functions).

debug incr-patched - up to 4.4% regression, 0.5% mean
debug full/incr-full - up to 2% regression, 0.4% mean
debug incr-unchanged - up to 1% regression, 0.3% mean

Wow, much better! These numbers are very inspiring.

Let's now restrict the inlining consideration to non-local functions only (i.e. to function from other crates). If I understand it correctly, in this case, we will not call mir_callgraph_reachable query and hence we will skip the call graph calculation. This should improve the numbers a bit more. Note that we still don't perform the inlining itself!

bors · 2023-04-20T17:27:34Z

⌛ Trying commit cd69787 with merge e5fd7c91e36b8c7f682f5c0fe631164c5d15e628...

bors · 2023-04-20T19:08:23Z

☀️ Try build successful - checks-actions
Build commit: e5fd7c91e36b8c7f682f5c0fe631164c5d15e628 (e5fd7c91e36b8c7f682f5c0fe631164c5d15e628)

rust-timer · 2023-04-20T20:46:07Z

Finished benchmarking commit (e5fd7c91e36b8c7f682f5c0fe631164c5d15e628): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.4%, 1.2%]	9
Regressions ❌ (secondary)	1.6%	[0.4%, 2.7%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.7%	[0.4%, 1.2%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.7%	[-3.4%, -2.0%]	2
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.3%	[1.0%, 1.6%]	3
Regressions ❌ (secondary)	2.9%	[2.9%, 2.9%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.3%	[1.0%, 1.6%]	3

vlad20012 · 2023-04-21T09:10:46Z

⬆️ In that experiment, I enabled the Inlining pass for debug builds, but considered for inlining only non-local functions marked as #[inline(always)] and aborted any inlining attempt (in debug builds) right after check_mir_body invocation, so this is actually a measurement of infrastructure costs of inlining (i.e. reading mir bodies, etc) without doing the inlining itself.
Note that it differs from the previous experiment in that it considers non-local functions only (i.e. functions from other crates).

debug incr-patched - up to 0.5% regression, 0.1% mean
debug full/incr-full - up to 3% regression, 0.4% mean
debug incr-unchanged - up to 0.5% regression, 0.1% mean

It looks like there are very few regressions now! The most promising is that now there's almost no regression in incr-patched/incr-unchanged scenarios, so it seems potentially possible to consider enabling inlining even in incremental configuration.

Let's do some real inlining now! In the next experiment, I'm enabling inlining for debug builds (mir-opt-level=1) with all the previous restricting rules.

WaffleLapkin · 2023-04-21T13:25:20Z

@bors try @rust-timer queue

WaffleLapkin · 2023-04-21T13:27:47Z

@bors try

bors · 2023-04-21T13:27:57Z

⌛ Trying commit 40cf2cb with merge 5142ecd1025428756f8bad85db21c70248982db7...

bors · 2023-04-21T15:10:41Z

☀️ Try build successful - checks-actions
Build commit: 5142ecd1025428756f8bad85db21c70248982db7 (5142ecd1025428756f8bad85db21c70248982db7)

bors · 2023-04-21T15:10:41Z

☀️ Try build successful - checks-actions
Build commit: 5142ecd1025428756f8bad85db21c70248982db7 (5142ecd1025428756f8bad85db21c70248982db7)

rust-timer · 2023-04-21T17:41:28Z

Finished benchmarking commit (5142ecd1025428756f8bad85db21c70248982db7): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 3.0%]	21
Regressions ❌ (secondary)	1.2%	[0.4%, 2.6%]	8
Improvements ✅ (primary)	-0.7%	[-1.0%, -0.3%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[-1.0%, 3.0%]	26

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.4%	[0.1%, 2.4%]	10
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	1
Improvements ✅ (primary)	-2.1%	[-2.1%, -2.1%]	1
Improvements ✅ (secondary)	-2.9%	[-2.9%, -2.9%]	1
All ❌✅ (primary)	1.0%	[-2.1%, 2.4%]	11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.8%	[1.1%, 2.6%]	8
Regressions ❌ (secondary)	2.6%	[2.6%, 2.6%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-6.9%	[-7.1%, -6.7%]	3
All ❌✅ (primary)	1.8%	[1.1%, 2.6%]	8

vlad20012 · 2023-05-20T19:33:22Z

⬆️ In that experiment, I enabled the Inlining pass for debug builds (mir-opt-level=1), but considered for inlining only non-local functions (i.e. functions from other crates) marked as #[inline(always)].
Note that it differs from the previous experiment in that it really does inlining.

debug full/incr-full: 1.3% regression in serde and hyper. The most regressed query is optimized_mir. Also, there is a performance win: -1% in webrender! The most affected query is LLVM_module_codegen_emit_obj.

debug incr-unchanged: 3% regression in hyper. The most regressed queries are generate_crate_metadata, optimized_mir and encode_query_results_for

debug incr-patched: 2.6% regression in hyper. The most regressed queries are generate_crate_metadata, LLVM_module_codegen_emit_obj, encode_query_results_for, mir_for_ctfe and optimized_mir

These results show that the case is not hopeless, and enabling #[inline(always)] is achievable, perhaps with more restrictions. But at the moment I want to try enabling inlining with similar restrictions in incremental optimized builds. I'll try it in the separate PR.

Try to read #[inline(always)] MIR bodies but don't actually inline

796cafe

rustbot assigned oli-obk Apr 19, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 19, 2023