[compiler-v2][optimization] Create and retain temps for each arg #15514

vineethk · 2024-12-05T20:24:19Z

Description

This PR builds on the optimization made in #15445 where we optimized Assign stackless-bytecode instructions. In this PR, when generating stackless-bytecode, we create temporaries for each argument (even if they are simple variables), retain those assignments (through dead-store elimination) if they represent cross-block def-uses. In this way, the file-format bytecode generator knows when to load the cross-block definitions (fixes #15339, by essentially implementing eager loads optimization in a different way). The flush writes optimization needs additional logic to handle this pipeline.

With the optimization in this PR, on the aptos-framework and all its dependencies (std libs):

compiler v2 produces 3.6% less instructions than compiler v1.
along with [compiler-v2] Optimize stackless-bytecode assign instructions #15445, v2 produces 4.7% less instructions than without these optimizations.

Note that all changes introduced here should be hidden behind OPTIMIZE_WAITING_FOR_COMPARE_TESTS experiment (which is turned off by default), so landing this PR should not affect code generated.

Safety arguments:

assigning a new temp for each arg in a function call should be safe (because we already do this for non-trivial arguments)
not removing certain deadcode of the form x = x should be safe
flush writes optimization is purely a hinting mechanism which advises whether a definition loaded onto the stack should be flushed to a local write-away: so if it hints wrongly, we would have extra flushes to a local, but would not affect the execution semantics

A couple of examples of this optimization working:

This code:

    fun test1(x: u64) {
        bar(x, one(), one(), one(), one(), one());
    }

produces before this PR:

	0: Call one(): u64
	1: Call one(): u64
	2: Call one(): u64
	3: Call one(): u64
	4: Call one(): u64
	5: StLoc[1](loc0: u64)
	6: StLoc[2](loc1: u64)
	7: StLoc[3](loc2: u64)
	8: StLoc[4](loc3: u64)
	9: StLoc[5](loc4: u64)
	10: MoveLoc[0](Arg0: u64)
	11: MoveLoc[5](loc4: u64)
	12: MoveLoc[4](loc3: u64)
	13: MoveLoc[3](loc2: u64)
	14: MoveLoc[2](loc1: u64)
	15: MoveLoc[1](loc0: u64)
	16: Call bar(u64, u64, u64, u64, u64, u64)
	17: Ret

after this PR:

	0: MoveLoc[0](Arg0: u64)
	1: Call one(): u64
	2: Call one(): u64
	3: Call one(): u64
	4: Call one(): u64
	5: Call one(): u64
	6: Call bar(u64, u64, u64, u64, u64, u64)
	7: Ret

This code:

    fun test2(x: u64, y: u64): u64 {
        x + (y * x)
    }

produces before this PR:

	0: MoveLoc[1](Arg1: u64)
	1: CopyLoc[0](Arg0: u64)
	2: Mul
	3: StLoc[1](Arg1: u64)
	4: MoveLoc[0](Arg0: u64)
	5: MoveLoc[1](Arg1: u64)
	6: Add
	7: Ret

after this PR:

	0: CopyLoc[0](Arg0: u64)
	1: MoveLoc[1](Arg1: u64)
	2: MoveLoc[0](Arg0: u64)
	3: Mul
	4: Add
	5: Ret

How Has This Been Tested?

New tests (like the ones described above added) added
We show improvements on [compiler-v2] Test case reduced from move-stdlib showing opportunity for optimization #15338 and many of the tests added in [compiler-v2] Test cases reduced from the framework showcasing need for stack optimizations #14800
Added file format generation for a lot of tests that did not have it, and checked that we created fewer or equal instructions in all those cases

I've gone through all the test output changes:

In all existing tests, we produce fewer or equal instructions with the exception of minor regression in two tests (move-compiler-v2/tests/flush-writes/{def_use_03.on.exp, out_of_order_use_03.on.exp}. Filing an issue to follow up on these regressions.
Some error messages have become more precise, some have become more abstract (due to the introduction of the additional temporaries).

Key Areas to Review

Safety argument, correctness
Code retains existing behavior when experiment is off

Type of Change

Performance improvement

Which Components or Systems Does This Change Impact?

Move Compiler

trunk-io · 2024-12-05T20:24:23Z

⏱️ 1h 39m total CI duration on this PR

Slowest 15 Jobs	Cumulative Duration	Recent Runs
forge-compat-test / forge	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	12m	🟩
rust-move-tests	12m	🟩
rust-cargo-deny	9m	🟩 🟩 🟩 🟩 🟩 (+1 more)
check-dynamic-deps	6m	🟩 🟩 🟩 🟩 🟩 (+1 more)
check	3m	🟩
general-lints	2m	🟩 🟩 🟩 🟩 🟩 (+1 more)
semgrep/ci	2m	🟩 🟩 🟩 🟩 🟩 (+1 more)
file_change_determinator	1m	🟩 🟩 🟩 🟩 🟩 (+1 more)
permission-check	18s	🟩 🟩 🟩 🟩 🟩 (+1 more)
permission-check	14s	🟩 🟩 🟩 🟩 🟩 (+1 more)
check-branch-prefix	1s	🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

vineethk · 2024-12-05T20:24:39Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

third_party/move/move-compiler-v2/src/bytecode_generator.rs

third_party/move/move-compiler-v2/src/pipeline/dead_store_elimination.rs

third_party/move/move-compiler-v2/src/pipeline/flush_writes_processor.rs

github-actions · 2024-12-13T22:21:09Z

✅ Forge suite `realistic_env_max_load` success on `311694401230605775aa0dc163e53621f2670cc9`

two traffics test: inner traffic : committed: 14616.33 txn/s, latency: 2719.18 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5557440
two traffics test : committed: 100.10 txn/s, latency: 1371.94 ms, (p50: 1300 ms, p70: 1400, p90: 1500 ms, p99: 2500 ms), latency samples: 1740
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.605, avg: 1.561", "ConsensusProposalToOrdered: max: 0.324, avg: 0.296", "ConsensusOrderedToCommit: max: 0.318, avg: 0.309", "ConsensusProposalToCommit: max: 0.612, avg: 0.605"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.62s no progress at version 35432 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.56s no progress at version 2229840 (avg 0.56s) [limit 16].
Test Ok

github-actions · 2024-12-13T22:23:05Z

✅ Forge suite `framework_upgrade` success on `3c6e693a27339e73520f41030dce8fc9cd504967` ==> `311694401230605775aa0dc163e53621f2670cc9`

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9 (PR)
Upgrade the nodes to version: 311694401230605775aa0dc163e53621f2670cc9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1681.59 txn/s, submitted: 1683.33 txn/s, failed submission: 1.73 txn/s, expired: 1.73 txn/s, latency: 1990.15 ms, (p50: 2000 ms, p70: 2100, p90: 2700 ms, p99: 3800 ms), latency samples: 135740
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1423.02 txn/s, submitted: 1426.82 txn/s, failed submission: 3.80 txn/s, expired: 3.80 txn/s, latency: 2080.53 ms, (p50: 2100 ms, p70: 2200, p90: 3000 ms, p99: 4100 ms), latency samples: 127200
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9 passed
Upgrade the remaining nodes to version: 311694401230605775aa0dc163e53621f2670cc9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1428.33 txn/s, submitted: 1433.21 txn/s, failed submission: 4.66 txn/s, expired: 4.87 txn/s, latency: 2106.93 ms, (p50: 2100 ms, p70: 2300, p90: 3000 ms, p99: 4300 ms), latency samples: 128661
Test Ok

github-actions · 2024-12-13T22:50:46Z

✅ Forge suite `compat` success on `3c6e693a27339e73520f41030dce8fc9cd504967` ==> `311694401230605775aa0dc163e53621f2670cc9`

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9 (PR)
1. Check liveness of validators at old version: 3c6e693a27339e73520f41030dce8fc9cd504967
compatibility::simple-validator-upgrade::liveness-check : committed: 16414.22 txn/s, latency: 2080.76 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 2800 ms), latency samples: 530400
2. Upgrading first Validator to new version: 311694401230605775aa0dc163e53621f2670cc9
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6391.73 txn/s, latency: 4408.13 ms, (p50: 4900 ms, p70: 5100, p90: 6200 ms, p99: 6300 ms), latency samples: 120780
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6524.84 txn/s, latency: 4960.76 ms, (p50: 5300 ms, p70: 5400, p90: 5500 ms, p99: 5600 ms), latency samples: 222940
3. Upgrading rest of first batch to new version: 311694401230605775aa0dc163e53621f2670cc9
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6634.80 txn/s, latency: 4303.67 ms, (p50: 4800 ms, p70: 5100, p90: 5600 ms, p99: 5700 ms), latency samples: 126600
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6930.88 txn/s, latency: 4800.15 ms, (p50: 5100 ms, p70: 5200, p90: 5400 ms, p99: 5800 ms), latency samples: 233260
4. upgrading second batch to new version: 311694401230605775aa0dc163e53621f2670cc9
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11082.84 txn/s, latency: 2512.21 ms, (p50: 2800 ms, p70: 2900, p90: 3000 ms, p99: 3100 ms), latency samples: 194660
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11389.07 txn/s, latency: 2809.47 ms, (p50: 2900 ms, p70: 3000, p90: 3100 ms, p99: 3200 ms), latency samples: 372260
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9 passed
Test Ok

* Creating temps for each arg. (#15514) * [compiler-v2] Enable recent stack-optimizations by default (#15595)

vineethk force-pushed the vk/temps-for-each-args branch 4 times, most recently from f1475b9 to 55e3631 Compare December 6, 2024 12:27

vineethk changed the title ~~[compiler-v2] Create temps for each arg~~ [compiler-v2][optimization] Create and retain temps for each arg Dec 6, 2024

vineethk force-pushed the vk/temps-for-each-args branch from 55e3631 to 92aaf1d Compare December 6, 2024 13:42

vineethk marked this pull request as ready for review December 6, 2024 14:01

vineethk requested review from rahxephon89, fEst1ck, wrwg and brmataptos December 6, 2024 14:01

brmataptos reviewed Dec 7, 2024

View reviewed changes

third_party/move/move-compiler-v2/src/bytecode_generator.rs Show resolved Hide resolved

third_party/move/move-compiler-v2/src/pipeline/dead_store_elimination.rs Show resolved Hide resolved

third_party/move/move-compiler-v2/src/pipeline/flush_writes_processor.rs Show resolved Hide resolved

Creating temps for each arg.

49843f8

vineethk force-pushed the vk/temps-for-each-args branch from 92aaf1d to 49843f8 Compare December 13, 2024 12:58

vineethk mentioned this pull request Dec 13, 2024

[compiler-v2] Enable recent stack-optimizations by default #15595

Merged

3 tasks

rahxephon89 approved these changes Dec 13, 2024

View reviewed changes

fEst1ck approved these changes Dec 13, 2024

View reviewed changes

Merge branch 'main' into vk/temps-for-each-args

3116944

vineethk enabled auto-merge (squash) December 13, 2024 21:53

This comment has been minimized.

Sign in to view

vineethk merged commit da63178 into main Dec 13, 2024
80 of 88 checks passed

vineethk deleted the vk/temps-for-each-args branch December 13, 2024 22:50

vineethk added a commit that referenced this pull request Dec 16, 2024

Creating temps for each arg. (#15514)

a45e298

vineethk mentioned this pull request Dec 16, 2024

Cherry pick certain optimization PRs #15612

Merged

vineethk added a commit that referenced this pull request Dec 16, 2024

Cherry pick certain optimization PRs (#15612)

3402522

* Creating temps for each arg. (#15514) * [compiler-v2] Enable recent stack-optimizations by default (#15595)

georgemitenkov pushed a commit that referenced this pull request Jan 6, 2025

Creating temps for each arg. (#15514)

8a98242

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler-v2][optimization] Create and retain temps for each arg #15514

[compiler-v2][optimization] Create and retain temps for each arg #15514

vineethk commented Dec 5, 2024 •

edited

Loading

trunk-io bot commented Dec 5, 2024 •

edited

Loading

vineethk commented Dec 5, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Dec 13, 2024

This comment has been minimized.

github-actions bot commented Dec 13, 2024

This comment has been minimized.

github-actions bot commented Dec 13, 2024

[compiler-v2][optimization] Create and retain temps for each arg #15514

[compiler-v2][optimization] Create and retain temps for each arg #15514

Conversation

vineethk commented Dec 5, 2024 • edited Loading

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

trunk-io bot commented Dec 5, 2024 • edited Loading

vineethk commented Dec 5, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Dec 13, 2024

✅ Forge suite realistic_env_max_load success on 311694401230605775aa0dc163e53621f2670cc9

This comment has been minimized.

github-actions bot commented Dec 13, 2024

✅ Forge suite framework_upgrade success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9

This comment has been minimized.

github-actions bot commented Dec 13, 2024

✅ Forge suite compat success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 311694401230605775aa0dc163e53621f2670cc9

vineethk commented Dec 5, 2024 •

edited

Loading

trunk-io bot commented Dec 5, 2024 •

edited

Loading

vineethk commented Dec 5, 2024 •

edited

Loading

✅ Forge suite `realistic_env_max_load` success on `311694401230605775aa0dc163e53621f2670cc9`

✅ Forge suite `framework_upgrade` success on `3c6e693a27339e73520f41030dce8fc9cd504967` ==> `311694401230605775aa0dc163e53621f2670cc9`

✅ Forge suite `compat` success on `3c6e693a27339e73520f41030dce8fc9cd504967` ==> `311694401230605775aa0dc163e53621f2670cc9`