Remove superfluous initial iteration of repeating parts of optimizer sequence by cameel · Pull Request #14854 · argotorg/solidity

cameel · 2024-02-14T18:24:04Z

The optimizer will repeat each bracketed segment of the optimizer step sequence at least twice, because the cost is only measured after the first iteration. This results in unnecessary repetitions in some cases.

Hard to tell how much it actually affects the optimization time in practice. From my observations, a single iteration is often insufficient so many contracts would not be affected. On the other hand, for the lucky ones where one iteration is enough, it could cut optimization time almost in half. I need to see CI timings to say more.

Looks like I unwittingly introduced this regression back in #12102. Originally this loop looked like this and I moved the check to the end without updating the initialization: https://github.com/ethereum/solidity/blob/f35694f655e81d503b3cbc1aee324ea4cc848a3c/libyul/optimiser/Suite.cpp#L84-L92

cameel · 2024-02-14T18:41:59Z

Looks like the actual differences are minimal, if any, and go in both directions. For most jobs they are within the margin of error.

Job	`develop`	this PR
`b_bytecode_ubu-via-ir-optimize`	9m 39s	10m 12s
`b_bytecode_osx-via-ir-optimize`	4m 23s	4m 18s
`t_native_test_ext_zeppelin`	12m 17s	10m 49s
`t_native_test_ext_brink`	1m 21s	1m 40s
`t_native_test_ext_elementfi`	4m 41s	4m 16s
`t_native_test_ext_ens`	2m 25s	2m 41s
`t_native_test_ext_perpetual_pools`	18m 52s	19m 16s
`t_native_test_ext_prb_math`	4m 10s	4m 9s
`t_native_test_ext_uniswap`	9m 35s	9m 22s
`t_native_test_ext_yield_liquidator`	2m 26s	2m 36s

The only one that stands out is the OpenZeppelin external test, where it's a 12% reduction, though, this could still be just CI variance.

cameel · 2024-02-14T19:04:47Z

test/libsolidity/semanticTests/functionCall/gas_and_value_basic.sol

 // ----
 // constructor(), 20 wei ->
-// gas irOptimized: 252642
+// gas irOptimized: 260928


The change does affect some gas costs but very minimally. The only test where difference is somewhat significant is this one (and the similar gas_and_value_brace_syntax.sol) where it's ~3%.

Have you looked at the diff of optimized yul? Not sure how well it'd compare, though...

It's also generally a problem that the code size metric used for this is merely a rather coarse estimate in itself...

Not yet. I'll check this.

Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?

This is the relevant part of the diff:

revert(0, 0x24) } mstore(_2, _1) + _4 := 0 } let memPos := mload(_2) mstore(memPos, 0x01) return(memPos, 32) } case 0x96dfcbea { - if callvalue() { revert(0, 0) } - if slt(add(calldatasize(), not(3)), 0) { revert(0, 0) } - let value_1 := and(sload(0), sub(shl(160, 1), 1)) - let _5 := mload(_2) - mstore(_5, shl(228, 0x0f963393)) - let _6 := call(gas(), value_1, 0, _5, _3, _5, 32) - if iszero(_6) + if callvalue() { revert(_4, _4) } + if slt(add(calldatasize(), not(3)), _4) { revert(_4, _4) } + let value_1 := and(sload(_4), sub(shl(160, 1), 1)) + let _6 := mload(_2) + mstore(_6, shl(228, 0x0f963393)) + let _7 := call(gas(), value_1, _4, _6, _3, _6, 32) + if iszero(_7) { let pos_1 := mload(_2) - returndatacopy(pos_1, 0, returndatasize()) + returndatacopy(pos_1, _4, returndatasize()) revert(pos_1, returndatasize()) } - let expr := 0 - if _6 + let expr := _4 + if _7 { - let _7 := 32 - if gt(32, returndatasize()) { _7 := returndatasize() } - finalize_allocation(_5, _7) - if slt(sub(add(_5, _7), _5), 32) { revert(0, 0) } - let value_2 := mload(_5) - if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(0, 0) } + let _8 := 32 + if gt(32, returndatasize()) { _8 := returndatasize() } + finalize_allocation(_6, _8) + if slt(sub(add(_6, _8), _6), 32) { revert(_4, _4) } + let value_2 := mload(_6) + if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(_4, _4) } expr := value_2 } let var_myBal := selfbalance()

I.e. it seems that a bunch of 0 literals used to be materialized and now remain as variables.

Which is odd - I wanted to suggest maybe adding another T to the sequence but it's right there in the cleanup : fDnTOcmu.

cameel · 2024-02-14T21:15:29Z

test/libyul/yulOptimizerTests/fullSuite/unusedFunctionParameterPruner_loop.yul

 //         f()
-//         f()
-//         f()
+//         f_72()
+//         f_73()
 //         sstore(0, 1)
 //     }
 //     function f()
 //     {
+//         let a := 1
+//         let b := 10
+//         let a_1 := calldataload(0)
+//         let _1 := iszero(a_1)
+//         for { } iszero(b) { b := add(b, not(0)) }
+//         {
+//             a := a_1
+//             mstore(a_1, 0)
+//             if _1 { leave }
+//         }
+//     }
+//     function f_72()
+//     {
+//         let a := 2
+//         let b := 10
+//         let a_1 := calldataload(0)
+//         let _1 := iszero(a_1)
+//         for { } iszero(b) { b := add(b, not(0)) }
+//         {
+//             a := a_1
+//             mstore(a_1, 0)
+//             if _1 { leave }
+//         }
+//     }
+//     function f_73()
+//     {
+//         let a := 3
 //         let b := 10
-//         let a := calldataload(0)
-//         let _1 := iszero(a)
+//         let a_1 := calldataload(0)
+//         let _1 := iszero(a_1)
 //         for { } iszero(b) { b := add(b, not(0)) }
 //         {
-//             mstore(a, 0)
+//             a := a_1
+//             mstore(a_1, 0)
 //             if _1 { leave }
 //         }
 //     }


Looks like the changes show up not only as gas differences. Two tests also got updated output.

In this case it looks like a clear downgrade. The optimizer is definitely leaving some possible optimizations on the table. Still, not sure how common this is in practice. I'm going to play with this when tweaking the optimizer sequence.

I mean, the optimizer is mainly being rather stupid here - it specializes all three calls and then trusts that since the arguments are unused, the result will collapse to identical functions that can be deduplicated... the simpler choice would have been to not specialize in the first place.
For cases like this it's not worth preserving optimizations that work out by chance. Even if someone thought it funny to include it as test case :-).

libyul/optimiser/Suite.cpp

nikola-matic · 2024-02-20T11:41:48Z

test/libsolidity/semanticTests/functionCall/gas_and_value_basic.sol

 // ----
 // constructor(), 20 wei ->
-// gas irOptimized: 252642
+// gas irOptimized: 260928


Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?

cameel · 2024-04-22T14:24:11Z

Gas diff stats

File name	IR optimized	Legacy optimized	Legacy
`functionCall/gas_and_value_brace_syntax.sol`	3%
`functionCall/gas_and_value_basic.sol`	3%
`externalContracts/deposit_contract.sol`	+0%
`libraries/internal_types_in_library.sol`	+0%
`userDefinedValueType/calldata.sol`	+0%
`types/mapping/copy_from_mapping_to_mapping.sol`	+0%
`array/copying/array_to_mapping.sol`	+0%
`array/copying/array_elements_to_mapping.sol`	+0%
`array/copying/nested_array_of_structs_memory_to_storage.sol`	+0%
`array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol`	+0%
`various/skip_dynamic_types_for_structs.sol`	+0%
`externalContracts/snark.sol`	+0%
`array/copying/nested_array_of_structs_calldata_to_storage.sol`	+0%
`array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol`	+0%
`array/copying/storage_memory_packed_dyn.sol`	-0%
`structs/copy_to_mapping.sol`	-0%
`structs/copy_substructures_from_mapping.sol`	-0%
`structs/copy_substructures_to_mapping.sol`	-0%
`array/copying/calldata_array_to_mapping.sol`	-0%

cameel · 2024-04-22T15:07:00Z

External tests (`ir-optimize-evm+yul`)

project	bytecode_size	deployment_gas	method_gas
brink	`0%`
colony	`0%`
elementfi	`+0.21% ❌`
ens	`-0.5% ✅`	`-0%`	`0%`
euler	`+0.01% ❌`
gnosis
gp2	`0%`
perpetual-pools	`+0%`	`+0%`	`-0.02% ✅`
pool-together	`-0.14% ✅`
uniswap	`-0.01% ✅`
yield_liquidator	`+0.01% ❌`	`+0.01% ❌`	`+0%`
zeppelin	`-0.01% ✅`	`-0.04% ✅`	`-0.06% ✅`

Compilation time

Project	Before	After	Diff
brink	5 s	5 s	0 s
colony	105 s	117 s	12 s
elementfi	172 s	157 s	-15 s
ens	40 s	39 s	-1 s
euler	52 s	60 s	8 s
gnosis
gp2	42 s	46 s	4 s
perpetual-pools	81 s	77 s	-4 s
pool-together	56 s	46 s	-10 s
uniswap	82 s	73 s	-9 s
yield_liquidator	26 s	38 s	12 s
zeppelin	261 s	245 s	-16 s

cameel · 2024-04-22T16:54:16Z

Quick and dirty non-CI benchmark

Since CI timing is almost useless here due to a huge variance, I tried to benchmark it manually using a few projects that can be compiled right away via foundry.

script

function benchmark_foundry_project {
    local subdir="$1"
    local url="$2"
    local solc_path="$3"

    git clone "$url" "$subdir" --depth=1
    pushd "$subdir"
    forge install
    /usr/bin/time --output ../timing.json --append --format "{\"$subdir\": {\"real\": %e, \"user\": %U, \"sys\": %S}}" \
        forge build --use "$solc_path" --optimize --via-ir --evm-version cancun --offline --no-cache
    popd
    rm -rf "./${subdir}"
}

rm timing.json

solc_binary=/tmp/solc-develop
benchmark_foundry_project uniswap-before https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-before https://github.com/ProjectOpenSea/seaport-core "$solc_binary"

solc_binary=/tmp/solc-fix-superfluous-iterations-in-optimizer-sequence
benchmark_foundry_project uniswap-after https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-after https://github.com/ProjectOpenSea/seaport-core "$solc_binary"

cat timing.json | jq --slurp add

I wanted to include OpenZeppelin as well but it sadly fails with "Stack Too Deep" for both binaries (despite the memory guard being present). Same with a 0.8.25 release binary.

Results

Project	run	`develop`	this PR
uniswap	1	308 s	280 s
uniswap	2	296 s	279 s
uniswap	3	289 s	277 s
uniswap	avg	298 s	278 s
seaport	1	25 s	17 s
seaport	2	19 s	17 s
seaport	3	19 s	23 s
seaport	avg	21 s	19 s

Looks like this has a lot of variance too, but there's still a clear and reproducible difference in favor of the PR on the order of 10%.

ekpyron

Well, if you say this is a 10% improvement in compilation time, let's just merge this for now.

cameel · 2024-05-08T15:28:53Z

Yeah. It's not a large sample, but it was reproducible when I ran it several times (see the table) so I think it's legit.

BTW, I had to resolve a changelog conflict - please reapprove.

…topping after the first cycle

cameel added the optimizer label Feb 14, 2024

cameel self-assigned this Feb 14, 2024

cameel changed the title ~~Fix superfluous initial iteration of repeating parts of optimizer sequence~~ Remove superfluous initial iteration of repeating parts of optimizer sequence Feb 14, 2024

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch 2 times, most recently from 0a00253 to 14699de Compare February 14, 2024 19:02

cameel commented Feb 14, 2024

View reviewed changes

cameel added the performance 🐎 label Feb 14, 2024

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 14699de to a2cabcb Compare February 14, 2024 21:08

cameel commented Feb 14, 2024

View reviewed changes

nikola-matic reviewed Feb 20, 2024

View reviewed changes

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch 2 times, most recently from 5a544bd to 5ea8fa2 Compare March 5, 2024 17:10

cameel mentioned this pull request Mar 6, 2024

[seqbench] the-good-parts sequence #14909

Closed

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 20, 2024

cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Mar 20, 2024

argotorg deleted a comment from github-actions bot Mar 25, 2024

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 9, 2024

cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 9, 2024

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 5ea8fa2 to 55cb7a7 Compare April 18, 2024 08:07

argotorg deleted a comment from github-actions bot Apr 18, 2024

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 55cb7a7 to 481c3be Compare April 22, 2024 14:17

cameel mentioned this pull request Apr 22, 2024

Stack shuffling refactor #14851

Closed

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label May 7, 2024

cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label May 7, 2024

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 481c3be to 6223915 Compare May 7, 2024 12:52

argotorg deleted a comment from github-actions bot May 7, 2024

ekpyron previously approved these changes May 7, 2024

View reviewed changes

cameel dismissed ekpyron’s stale review via a4b5fb0 May 8, 2024 15:27

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 6223915 to a4b5fb0 Compare May 8, 2024 15:27

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from a4b5fb0 to 2a40919 Compare May 9, 2024 06:35

matheusaaguiar approved these changes May 10, 2024

View reviewed changes

Fix a bug preventing repeated parts of optimizer sequence from ever s…

c9e2c20

…topping after the first cycle

cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 2a40919 to c9e2c20 Compare May 10, 2024 16:23

cameel enabled auto-merge May 10, 2024 16:23

cameel merged commit e202d30 into develop May 10, 2024

cameel deleted the fix-superfluous-iterations-in-optimizer-sequence branch May 10, 2024 17:01

cameel mentioned this pull request Jun 5, 2024

Performance benchmarks using external projects with Foundry #15183

Merged

4 tasks

Conversation

cameel commented Feb 14, 2024

Uh oh!

cameel commented Feb 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel commented Apr 22, 2024

Gas diff stats

Uh oh!

cameel commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

External tests (ir-optimize-evm+yul)

Compilation time

Uh oh!

cameel commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quick and dirty non-CI benchmark

Results

Uh oh!

ekpyron left a comment

Choose a reason for hiding this comment

Uh oh!

cameel commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cameel commented Apr 22, 2024 •

edited

Loading

External tests (`ir-optimize-evm+yul`)

cameel commented Apr 22, 2024 •

edited

Loading