feat: skeleton for EVM v2 with long[] stack and --Xevm-v2 toggle#10105
feat: skeleton for EVM v2 with long[] stack and --Xevm-v2 toggle#10105siladu merged 9 commits intobesu-eth:mainfrom
Conversation
Introduces the structural scaffolding for a new EVM execution path (disabled by default) that uses long[] as its stack operand type instead of Tuweni Bytes, eliminating per-operation object allocation on the hot path. Changes: - Add `enableEvmV2` field to EvmConfiguration (default false); existing 3-arg convenience constructor shields 80+ call sites from changes - Wire `--Xevm-go-fast` hidden CLI flag in EvmOptions and evmtool - Add `enableEvmV2` to TxValues, propagated to each MessageFrame - Add `long[] stackV2` dual-stack to MessageFrame with push/pop/get/set long[] methods; v1 OperandStack is untouched - Add private `runToHaltV2()` in EVM with a v2 dispatch switch; runToHalt() branches to it when the flag is enabled - Add AddOperationV2 as a stub example of the v2 operation pattern (pure long[] carry arithmetic, zero object allocation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
| */ | ||
| public void pushStackLongs(final long v0, final long v1, final long v2, final long v3) { | ||
| final int nextTop = stackV2Top + 1; | ||
| if (nextTop >= txValues.maxStackSize()) { |
There was a problem hiding this comment.
We might want to be careful that txValues.maxStackSize() does not prevent inlining this method. As txValues.maxStackSize() is constant, we might can use a private variable to store the value.
There was a problem hiding this comment.
Alternatively we could make this method unsafe and the caller needs to check that enough space is available. Something like this:
messageFrame.hasCapacity(1);
messageFrame.pushStackLongsUnsafe(...);
Then we could be certain that both methods will be inlined
There was a problem hiding this comment.
vibe coded nonsense -> now more like StackMath
| public static final String OPTIMIZED_OP_CODES = "--Xevm-optimized-opcodes"; | ||
|
|
||
| /** The constant EVM_GO_FAST. */ | ||
| public static final String EVM_GO_FAST = "--Xevm-go-fast"; |
There was a problem hiding this comment.
--Xevm-v2 seems more clear about the intent of the flag and is matching the internal variable naming
There was a problem hiding this comment.
Just a little joke. I have made --Xevm-v2 the main flag and left this as an alias
| private final OperandStack stack; | ||
| // EVM v2 stack: 4 longs per 256-bit word (index 0 = most significant, index 3 = least | ||
| // significant) | ||
| private final long[] stackV2; |
There was a problem hiding this comment.
I think a separate MessageFram might make more sense. By reusing the existing one we increase every MeesageFrame even for v1 by 32kb, which seems not necessary.
There was a problem hiding this comment.
We discussed and decided not to do separate MessageFrame at least initially.
The class is large and the changes we are adding are small.
MessageFrameV2 will have a larger blast radius for things like tests. It would be nice to share test infra as much as possible, and ideally just set a flag to test v2.
However, we will explore this approach as well, just not for this initial skeleton PR.
Will also check this doesn't cause memory issues.
There was a problem hiding this comment.
So it's not an extra 32KB for V1 because it gets initialized as null; it's only an extra 8 bytes per MessageFrame (maybe a little more depending on object layout byte alignment).
Think this is negligible and acceptable to avoid a duplicate MessageFrame class.
| this.worldUpdater = worldUpdater; | ||
| this.gasRemaining = initialGas; | ||
| this.stack = new OperandStack(txValues.maxStackSize()); | ||
| this.stackV2 = txValues.enableEvmV2() ? new long[txValues.maxStackSize() * 4] : null; |
There was a problem hiding this comment.
In a first version of the stack long array PR I also did allocate 32kb on the fly, but found that it had a huge performance impact on CALL op codes
That's why I added the StackPool in that branch: https://github.com/daniellehrner/besu/blob/8273e34a43eb74c2f52fd32b9322c531fcd9c3f2/evm/src/main/java/org/hyperledger/besu/evm/internal/StackPool.java
There was a problem hiding this comment.
Yep, StackPool or equivalent will come, this is just a placeholder really to keep PR small and to the point.
There was a problem hiding this comment.
Is adding StackPool increasing a lot the size of this PR ? I haven't looked into but it is part of the infrastructure part, it can make sens to have it part of this PR.
There was a problem hiding this comment.
I can add it it's not a big deal either way since nothing is using it.
Depends if we want to review StackPool impl as part of this PR.
| * @return the operation result | ||
| */ | ||
| public static Operation.OperationResult staticOperation(final MessageFrame frame) { | ||
| final long[] a = new long[4]; |
There was a problem hiding this comment.
The original PR did those operations inline and did not stack allocate new variables. We need to make sure the C2 gets rid of this overhead.
There was a problem hiding this comment.
yeh this is vibe coded nonsense, I made it more like StackMath
| final long carry3 = Long.compareUnsigned(sum3, a[3]) < 0 ? 1L : 0L; | ||
|
|
||
| final long sum2 = a[2] + b[2] + carry3; | ||
| final long carry2 = |
There was a problem hiding this comment.
Using branches for the carry computation had an 30% overhead in comparison to doing a branchless version in previous benchmarks.
Branchless version: https://github.com/daniellehrner/besu/blob/8273e34a43eb74c2f52fd32b9322c531fcd9c3f2/evm/src/main/java/org/hyperledger/besu/evm/internal/StackMath.java#L61
There was a problem hiding this comment.
More vibe-coded nonsense, I replaced with StackMath impl, but then we discussed removing impl to keep this a non-functional skeleton...I now just throw UnsupportedOperationException.
…ementation Replaces the initial skeleton stub with a complete, benchmarked AddOperationV2. Key changes: - operation/v2/AbstractFixedCostOperationV2: shared base class handling gas checks and stack overflow/underflow, with pre-allocated OperationResult constants to avoid hot-path allocation - operation/v2/AddOperationV2: full in-place 256-bit addition with single-limb fast path; moved from operation/ to operation/v2/ package - MessageFrame: replaced verbose push/pop API with direct-access pattern (stackDataV2, stackTopV2, setTopV2, stackHasItems) enabling zero-overhead in-place arithmetic - EVM.runToHaltV2: updated to pass stackDataV2() alongside frame to staticOperation - ConfigurationOverviewBuilder: prints EVM v2 notice when enabled; suppresses redundant opcode-optimizations line in that mode - evmtool: execution-strategy wrapper prints v2 notice before any subcommand, covering benchmark (which never calls getEvmConfiguration via Dagger) - JMH benchmarks: AddOperationV2Benchmark, BinaryOperationV2Benchmark, BenchmarkHelperV2 helper with deterministic MessageFrame setup - --Xevm-v2 added as CLI alias alongside --Xevm-go-fast Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
|
Moved AddOperationV2 to v2 package and made it benchmarkable. Logging when flag enabled... |
| long a0 = s[a + 3], a1 = s[a + 2], a2 = s[a + 1], a3 = s[a]; | ||
| long b0 = s[b + 3], b1 = s[b + 2], b2 = s[b + 1], b3 = s[b]; | ||
| long z0 = a0 + b0; | ||
| long c = ((a0 & b0) | ((a0 | b0) & ~z0)) >>> 63; |
There was a problem hiding this comment.
this way of computing carries is unreadable and has no benefit over Long.compareUnsigned. Just create a method in UInt256 that reuses the logic that's already there for add.
Unless we which to inline the code for UInt256::mod this way as well, it creates an inconsistent pattern and code organization issue.
There was a problem hiding this comment.
This is literally copied from #9881. This PR is not meant to be a perfect AddOperationV2, it's just proving out the v2 structure. I think the idea is to get StackMath code into the new structure, which resolves Glamsterdam, then incrementally improve.
Just create a method in UInt256 that reuses the logic that's already there for add.
We are trying to avoid new allocations though right? UInt256.add methods either take bytes[] or UInt256.
There was a problem hiding this comment.
Replaced AddOperationV2 contents with UnsupportedOperationException
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
| * skeleton stub establishes the dispatch structure for incremental v2 operation rollout. | ||
| */ | ||
| // Note: like runToHalt, this is performance-critical code. Benchmark before refactoring. | ||
| private void runToHaltV2(final MessageFrame frame, final OperationTracer tracing) { |
There was a problem hiding this comment.
I would actually duplicate EVM.java as well, like in v2 package have EVMV2.java, that has the new implementation of runToHalt.
There was a problem hiding this comment.
As discussed, not sure it's worth it. There's a lot more wiring to do with EVMV2, even abstracting just this method into it and calling it here isn't worth it IMO, at least not unless this class starts getting unwieldy, but I think everything will be limited to this method.
| // | ||
| // Please benchmark before refactoring. | ||
| public void runToHalt(final MessageFrame frame, final OperationTracer tracing) { | ||
| if (evmConfiguration.enableEvmV2()) { |
There was a problem hiding this comment.
I would do this redirection upstream, in AbstractMessageProcessor class or even before.
There was a problem hiding this comment.
I think this method actually minimizes blast radiusm otherwise we'll need to wire in a load of EVMV2s in various places, including MainnetEVMs (for each fork).
| this.worldUpdater = worldUpdater; | ||
| this.gasRemaining = initialGas; | ||
| this.stack = new OperandStack(txValues.maxStackSize()); | ||
| this.stackDataV2 = txValues.enableEvmV2() ? new long[txValues.maxStackSize() * 4] : null; |
There was a problem hiding this comment.
Good catch, can't see a good reason for it, probably more Claude slop
Fixed f3fdeef
| public record TxValues( | ||
| BlockHashLookup blockHashLookup, | ||
| int maxStackSize, | ||
| boolean enableEvmV2, |
There was a problem hiding this comment.
Not sure this is needed, TxValues is referenced by MessageFrame, and it is a lot of redundancy and the value is the same for all MessageFrames.
There was a problem hiding this comment.
Good catch, can't see a good reason for it, probably more Claude slop
Fixed f3fdeef
ahamlat
left a comment
There was a problem hiding this comment.
I like the small scope of the PR, i.e infrastructure to prepare EVM v2. I would to see small changes, like change when we redirect to v1 or v2.
Maybe also consider integrating SackPool as it is part of the infrastructure part.
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
| yield currentOperation.execute(frame, this); | ||
| } | ||
| }; | ||
| } catch (final org.hyperledger.besu.evm.internal.OverflowException oe) { |
There was a problem hiding this comment.
Why having the whole package is needed here ?
| }; | ||
| } catch (final org.hyperledger.besu.evm.internal.OverflowException oe) { | ||
| result = OVERFLOW_RESPONSE; | ||
| } catch (final org.hyperledger.besu.evm.internal.UnderflowException ue) { |
ahamlat
left a comment
There was a problem hiding this comment.
Small comments, nothing blocking.
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…u-eth#10105) feat: skeleton for EVM v2 with long[] stack and --Xevm-v2 toggle Introduces the structural scaffolding for a new EVM execution path (disabled by default) that uses long[] as its stack operand type instead of Tuweni Bytes, eliminating per-operation object allocation on the hot path. Changes: - Add enableEvmV2 field to EvmConfiguration (default false); existing 3-arg convenience constructor shields 80+ call sites from changes - Wire --Xevm-v2 hidden CLI flag in EvmOptions and evmtool - Add enableEvmV2 to MessageFrame - Add long[] stackV2 dual-stack to MessageFrame with push/pop/get/set long[] methods; v1 OperandStack is untouched - Add private runToHaltV2() in EVM with a v2 dispatch switch; runToHalt() branches to it when the flag is enabled - ConfigurationOverviewBuilder: prints EVM v2 notice when enabled; suppresses redundant opcode-optimizations line in that mode - evmtool: execution-strategy wrapper prints v2 notice before any subcommand, covering benchmark (which never calls getEvmConfiguration via Dagger) - Add unimplemented AddOperationV2 as a stub example of the v2 operation pattern in a v2 package. - Add AddOperationBenchmarkV2 as a working benchmark (Add opcode currently throws UnsupportedOperationException though) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>


First PR for #10131
Introduces the structural scaffolding for a new EVM execution path (disabled by default) that uses long[] as its stack operand type instead of Tuweni Bytes, eliminating per-operation object allocation on the hot path.
Changes:
enableEvmV2field to EvmConfiguration (default false); existing 3-arg convenience constructor shields 80+ call sites from changes--Xevm-v2hidden CLI flag in EvmOptions and evmtoolenableEvmV2to TxValues, propagated to each MessageFramelong[] stackV2dual-stack to MessageFrame with push/pop/get/set long[] methods; v1 OperandStack is untouchedrunToHaltV2()in EVM with a v2 dispatch switch; runToHalt() branches to it when the flag is enabledOnce Add is implemented, should be able to benchmark like this:
Logging when flag enabled...
Testing
2x nodes with
--Xevm-v2=falseversusmainshowed no significant performance differences.A node with
--Xevm-v2=truefails with UnsupportedOperationException as expected.See #10105 (comment)