Conversation
|
Looks like the actual differences are minimal, if any, and go in both directions. For most jobs they are within the margin of error.
The only one that stands out is the OpenZeppelin external test, where it's a 12% reduction, though, this could still be just CI variance. |
0a00253 to
14699de
Compare
| // ---- | ||
| // constructor(), 20 wei -> | ||
| // gas irOptimized: 252642 | ||
| // gas irOptimized: 260928 |
There was a problem hiding this comment.
The change does affect some gas costs but very minimally. The only test where difference is somewhat significant is this one (and the similar gas_and_value_brace_syntax.sol) where it's ~3%.
There was a problem hiding this comment.
Have you looked at the diff of optimized yul? Not sure how well it'd compare, though...
There was a problem hiding this comment.
It's also generally a problem that the code size metric used for this is merely a rather coarse estimate in itself...
There was a problem hiding this comment.
Not yet. I'll check this.
There was a problem hiding this comment.
Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?
There was a problem hiding this comment.
This is the relevant part of the diff:
revert(0, 0x24)
}
mstore(_2, _1)
+ _4 := 0
}
let memPos := mload(_2)
mstore(memPos, 0x01)
return(memPos, 32)
}
case 0x96dfcbea {
- if callvalue() { revert(0, 0) }
- if slt(add(calldatasize(), not(3)), 0) { revert(0, 0) }
- let value_1 := and(sload(0), sub(shl(160, 1), 1))
- let _5 := mload(_2)
- mstore(_5, shl(228, 0x0f963393))
- let _6 := call(gas(), value_1, 0, _5, _3, _5, 32)
- if iszero(_6)
+ if callvalue() { revert(_4, _4) }
+ if slt(add(calldatasize(), not(3)), _4) { revert(_4, _4) }
+ let value_1 := and(sload(_4), sub(shl(160, 1), 1))
+ let _6 := mload(_2)
+ mstore(_6, shl(228, 0x0f963393))
+ let _7 := call(gas(), value_1, _4, _6, _3, _6, 32)
+ if iszero(_7)
{
let pos_1 := mload(_2)
- returndatacopy(pos_1, 0, returndatasize())
+ returndatacopy(pos_1, _4, returndatasize())
revert(pos_1, returndatasize())
}
- let expr := 0
- if _6
+ let expr := _4
+ if _7
{
- let _7 := 32
- if gt(32, returndatasize()) { _7 := returndatasize() }
- finalize_allocation(_5, _7)
- if slt(sub(add(_5, _7), _5), 32) { revert(0, 0) }
- let value_2 := mload(_5)
- if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(0, 0) }
+ let _8 := 32
+ if gt(32, returndatasize()) { _8 := returndatasize() }
+ finalize_allocation(_6, _8)
+ if slt(sub(add(_6, _8), _6), 32) { revert(_4, _4) }
+ let value_2 := mload(_6)
+ if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(_4, _4) }
expr := value_2
}
let var_myBal := selfbalance()I.e. it seems that a bunch of 0 literals used to be materialized and now remain as variables.
Which is odd - I wanted to suggest maybe adding another T to the sequence but it's right there in the cleanup : fDnTOcmu.
14699de to
a2cabcb
Compare
| // f() | ||
| // f() | ||
| // f() | ||
| // f_72() | ||
| // f_73() | ||
| // sstore(0, 1) | ||
| // } | ||
| // function f() | ||
| // { | ||
| // let a := 1 | ||
| // let b := 10 | ||
| // let a_1 := calldataload(0) | ||
| // let _1 := iszero(a_1) | ||
| // for { } iszero(b) { b := add(b, not(0)) } | ||
| // { | ||
| // a := a_1 | ||
| // mstore(a_1, 0) | ||
| // if _1 { leave } | ||
| // } | ||
| // } | ||
| // function f_72() | ||
| // { | ||
| // let a := 2 | ||
| // let b := 10 | ||
| // let a_1 := calldataload(0) | ||
| // let _1 := iszero(a_1) | ||
| // for { } iszero(b) { b := add(b, not(0)) } | ||
| // { | ||
| // a := a_1 | ||
| // mstore(a_1, 0) | ||
| // if _1 { leave } | ||
| // } | ||
| // } | ||
| // function f_73() | ||
| // { | ||
| // let a := 3 | ||
| // let b := 10 | ||
| // let a := calldataload(0) | ||
| // let _1 := iszero(a) | ||
| // let a_1 := calldataload(0) | ||
| // let _1 := iszero(a_1) | ||
| // for { } iszero(b) { b := add(b, not(0)) } | ||
| // { | ||
| // mstore(a, 0) | ||
| // a := a_1 | ||
| // mstore(a_1, 0) | ||
| // if _1 { leave } | ||
| // } | ||
| // } |
There was a problem hiding this comment.
Looks like the changes show up not only as gas differences. Two tests also got updated output.
In this case it looks like a clear downgrade. The optimizer is definitely leaving some possible optimizations on the table. Still, not sure how common this is in practice. I'm going to play with this when tweaking the optimizer sequence.
There was a problem hiding this comment.
I mean, the optimizer is mainly being rather stupid here - it specializes all three calls and then trusts that since the arguments are unused, the result will collapse to identical functions that can be deduplicated... the simpler choice would have been to not specialize in the first place.
For cases like this it's not worth preserving optimizations that work out by chance. Even if someone thought it funny to include it as test case :-).
| // ---- | ||
| // constructor(), 20 wei -> | ||
| // gas irOptimized: 252642 | ||
| // gas irOptimized: 260928 |
There was a problem hiding this comment.
Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?
5a544bd to
5ea8fa2
Compare
5ea8fa2 to
55cb7a7
Compare
55cb7a7 to
481c3be
Compare
Gas diff stats
|
External tests (
|
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | 0% |
||
| colony | 0% |
||
| elementfi | +0.21% ❌ |
||
| ens | -0.5% ✅ |
-0% |
0% |
| euler | +0.01% ❌ |
||
| gnosis | |||
| gp2 | 0% |
||
| perpetual-pools | +0% |
+0% |
-0.02% ✅ |
| pool-together | -0.14% ✅ |
||
| uniswap | -0.01% ✅ |
||
| yield_liquidator | +0.01% ❌ |
+0.01% ❌ |
+0% |
| zeppelin | -0.01% ✅ |
-0.04% ✅ |
-0.06% ✅ |
Compilation time
| Project | Before | After | Diff |
|---|---|---|---|
| brink | 5 s | 5 s | 0 s |
| colony | 105 s | 117 s | 12 s |
| elementfi | 172 s | 157 s | -15 s |
| ens | 40 s | 39 s | -1 s |
| euler | 52 s | 60 s | 8 s |
| gnosis | |||
| gp2 | 42 s | 46 s | 4 s |
| perpetual-pools | 81 s | 77 s | -4 s |
| pool-together | 56 s | 46 s | -10 s |
| uniswap | 82 s | 73 s | -9 s |
| yield_liquidator | 26 s | 38 s | 12 s |
| zeppelin | 261 s | 245 s | -16 s |
Quick and dirty non-CI benchmarkSince CI timing is almost useless here due to a huge variance, I tried to benchmark it manually using a few projects that can be compiled right away via foundry. scriptfunction benchmark_foundry_project {
local subdir="$1"
local url="$2"
local solc_path="$3"
git clone "$url" "$subdir" --depth=1
pushd "$subdir"
forge install
/usr/bin/time --output ../timing.json --append --format "{\"$subdir\": {\"real\": %e, \"user\": %U, \"sys\": %S}}" \
forge build --use "$solc_path" --optimize --via-ir --evm-version cancun --offline --no-cache
popd
rm -rf "./${subdir}"
}
rm timing.json
solc_binary=/tmp/solc-develop
benchmark_foundry_project uniswap-before https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-before https://github.com/ProjectOpenSea/seaport-core "$solc_binary"
solc_binary=/tmp/solc-fix-superfluous-iterations-in-optimizer-sequence
benchmark_foundry_project uniswap-after https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-after https://github.com/ProjectOpenSea/seaport-core "$solc_binary"
cat timing.json | jq --slurp addI wanted to include OpenZeppelin as well but it sadly fails with "Stack Too Deep" for both binaries (despite the memory guard being present). Same with a 0.8.25 release binary. Results
Looks like this has a lot of variance too, but there's still a clear and reproducible difference in favor of the PR on the order of 10%. |
481c3be to
6223915
Compare
ekpyron
left a comment
There was a problem hiding this comment.
Well, if you say this is a 10% improvement in compilation time, let's just merge this for now.
6223915 to
a4b5fb0
Compare
|
Yeah. It's not a large sample, but it was reproducible when I ran it several times (see the table) so I think it's legit. BTW, I had to resolve a changelog conflict - please reapprove. |
a4b5fb0 to
2a40919
Compare
…topping after the first cycle
2a40919 to
c9e2c20
Compare
The optimizer will repeat each bracketed segment of the optimizer step sequence at least twice, because the cost is only measured after the first iteration. This results in unnecessary repetitions in some cases.
Hard to tell how much it actually affects the optimization time in practice. From my observations, a single iteration is often insufficient so many contracts would not be affected. On the other hand, for the lucky ones where one iteration is enough, it could cut optimization time almost in half. I need to see CI timings to say more.
Looks like I unwittingly introduced this regression back in #12102. Originally this loop looked like this and I moved the check to the end without updating the initialization: https://github.com/ethereum/solidity/blob/f35694f655e81d503b3cbc1aee324ea4cc848a3c/libyul/optimiser/Suite.cpp#L84-L92