Optimize EVM stack push operations with source generation #10120

benaadams · 2026-01-06T13:34:04Z

Changes

Source Generator Implementation

Added StackPushBytesGenerator and GenerateStackOpcodeGenerator to auto-generate optimized push methods for byte sizes 1-32
Eliminated runtime size checks and branching through compile-time specialization
Generated methods use [GenerateStackPushBytes(size, PadDirection)] attribute

SIMD Stack Operations

Replaced generic Span.CopyTo with direct Vector256<byte>/Vector128<byte> construction
Added CopyUpTo32 helper using unaligned SIMD reads for optimal byte copying
Single 32-byte stores replace multiple smaller writes
Specialized paths for common sizes (1, 2, 4, 8, 16 bytes)

VM Execution Loop

Simplified opcode dispatch using nuint indexing with function pointers
Removed special-case POP inlining (now uniform dispatch)
Moved opcode count tracking to exception paths only

API Changes

Push methods now return EvmExceptionType instead of void for unified error handling
Added PushBytesNullableRef for stack overflow detection without exceptions
Explicit PushZero/PushOne methods replace conditional pushes

Before:

stack.PushBytes<TTracingInst>(immediateData);  // Generic copy

After:

stack.Push8Bytes<TTracingInst>(ref bytes);  // Generated SIMD implementation

Types of changes

What types of changes does your code introduce?

Optimization

Testing

Requires testing

No

Documentation

Requires documentation update

No

Requires explanation in Release Notes

No

Copilot

Pull request overview

This PR optimizes EVM stack push operations by replacing generic Span.CopyTo calls with specialized, size-specific copy implementations that leverage SIMD instructions (Vector256/Vector128) for better performance.

Key changes:

Introduced specialized push methods (PushRightPaddedBytes, PushBothPaddedBytes) with optimized byte packing logic
Added helper methods (PackHiU64, PackLoU64, CopyUpTo32) for efficient small-size data copying
Replaced ternary conditional pushes with explicit method calls (PushZero, PushOne) for clearer semantics
Ensured proper memory alignment via AsAlignedSpan for warmup scenarios

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
EvmStack.cs	Core optimization: refactored `PushBytes` and introduced specialized push methods with SIMD-optimized byte packing for both left-padded and right-padded scenarios
EvmInstructions.Storage.cs	Optimized CALLDATALOAD to use specialized `PushRightPaddedBytes` instead of generic zero-padding, improving performance for common call data operations
EvmInstructions.Stack.cs	Updated PUSH operations to use renamed `PushBothPaddedBytes` method, maintaining correctness for edge cases where immediate data is truncated
EvmInstructions.Environment.cs	Simplified BLOCKHASH to use explicit `PushZero` instead of conditional with BytesZero32, improving code clarity
EvmInstructions.Call.cs	Optimized successful empty call path by using `PushOne` instead of pushing StatusCode bytes
VirtualMachine.Warmup.cs	Added alignment guarantees via `AsAlignedSpan` to ensure stack operations can safely use SIMD instructions

LukaszRozmej · 2026-01-06T22:07:44Z

Any benchmarks?

benaadams · 2026-01-07T08:49:47Z

Any benchmarks?

Going deeper

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

src/Nethermind/Nethermind.Evm.SourceGenerators/StackPushBytesGenerator.cs

src/Nethermind/Nethermind.HealthChecks/Nethermind.HealthChecks.csproj

src/Nethermind/Nethermind.Evm/EvmStack.cs

Demuirgos · 2026-01-07T14:19:58Z

src/Nethermind/Nethermind.Evm.SourceGenerators/StackPushBytesGenerator.cs

+        });
+    }
+
+    private static string? EmitForType(SourceProductionContext spc, INamedTypeSymbol? type, Candidate[] methods)


isnt there a way to have a template? it would provide a clearer idea on the structure of the emitted type

Changed to use more interpolated strings; though you can view the output as it is generated for the compilation

So 10 bytes

And 30 bytes

benaadams · 2026-01-07T17:24:58Z

@copilot summarise the changes for PR body text

Copilot · 2026-01-07T17:25:09Z

@benaadams I've opened a new pull request, #10137, to work on those changes. Once the pull request is ready, I'll request review from you.

src/Nethermind/Nethermind.Evm.SourceGenerators/Nethermind.Evm.SourceGenerators.csproj

benaadams · 2026-01-08T13:01:24Z

Any benchmarks?

benaadams · 2026-01-08T13:43:33Z

Any benchmarks?

The top ones that didn't move that much already had hand coded implementations

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated no new comments.

LukaszRozmej

The source generation seems overcomplicated to me. Can't we just do it by hand?

Looking at generated methods they all have similar pattern.

Top part is differs by one param - the count

        if (TTracingInst.IsActive)
        {
            _tracer.TraceBytes(in value, 16); // <- count goes here
        }

        uint headOffset = (uint)Head;
        uint newOffset = headOffset + 1;
        ref Vector256<byte> head = ref Unsafe.As<byte, Vector256<byte>>(ref Unsafe.Add(ref MemoryMarshal.GetReference(_bytes), (nint)(headOffset * WordSize)));
        if (newOffset >= MaxStackSize)
        {
            return EvmExceptionType.StackOverflow;
        }

        Head = (int)newOffset;

while bottom part is more complicated and has more variations, example:

        if (Vector256.IsHardwareAccelerated)
        {
            head = Vector256.Create(
                0UL,
                (ulong)Unsafe.ReadUnaligned<ushort>(ref value) << 48,
                Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 2)),
                Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 10))
            ).AsByte();
        }
        else
        {
            ref Vector128<ulong> head128 = ref Unsafe.As<Vector256<byte>, Vector128<ulong>>(ref head);

            head128 = Vector128.Create(
                0UL,
                (ulong)Unsafe.ReadUnaligned<ushort>(ref value) << 48
            );

            Unsafe.Add(ref head128, 1) = Vector128.Create(
                Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 2)),
                Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 10))
            );
        }

but can be broken down to simple stuff:

for V256 - we create one vector that's it
for V128 we basically interpret something as 128 then create 2 other 128's.

This code could be made generic statics with those and we could extract and inline all of this Something like:

    public partial EvmExceptionType PushBytes<TOp, TOpTTracingInst>(ref byte value)
        where TTracingInst : struct, global::Nethermind.Core.IFlag
        where TOp : IOpCount // or something else
    {
        if (TTracingInst.IsActive)
        {
            _tracer.TraceBytes(in value, TOp.Count);
        }

        uint headOffset = (uint)Head;
        uint newOffset = headOffset + 1;
        ref Vector256<byte> head = ref Unsafe.As<byte, Vector256<byte>>(ref Unsafe.Add(ref MemoryMarshal.GetReference(_bytes), (nint)(headOffset * WordSize)));
        if (newOffset >= MaxStackSize)
        {
            return EvmExceptionType.StackOverflow;
        }

        Head = (int)newOffset;

        if (Vector256.IsHardwareAccelerated)
        {
            head = TOp.Create256Vector();
       }
        else
        {
            Unsafe.Add(ref head128, 1) = TOp.Create128Vector()
        }

        return EvmExceptionType.None;
    }

pass the correct params, aggressive inline those and you are done without all the obfuscation of code generation

LukaszRozmej · 2026-01-09T16:07:47Z

src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.Stack.cs

+        public static EvmExceptionType Push<TTracingInst>(int length, ref EvmStack stack, int programCounter, ReadOnlySpan<byte> code)
+            where TTracingInst : struct, IFlag
+        {
+            throw new NotSupportedException($"Use the {nameof(InstructionPush2)} opcode instead");


Copilot AI review requested due to automatic review settings January 6, 2026 13:34

benaadams requested review from Demuirgos and LukaszRozmej as code owners January 6, 2026 13:34

benaadams added the performance is good label Jan 6, 2026

Copilot started reviewing on behalf of benaadams January 6, 2026 13:34 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

benaadams requested review from Marchhill and flcl42 January 6, 2026 13:53

benaadams marked this pull request as draft January 7, 2026 08:50

benaadams requested a review from Copilot January 7, 2026 13:12

Copilot started reviewing on behalf of benaadams January 7, 2026 13:13 View session

Copilot AI reviewed Jan 7, 2026

View reviewed changes

src/Nethermind/Nethermind.Evm.SourceGenerators/StackPushBytesGenerator.cs Outdated Show resolved Hide resolved

src/Nethermind/Nethermind.HealthChecks/Nethermind.HealthChecks.csproj Show resolved Hide resolved

src/Nethermind/Nethermind.Evm/EvmStack.cs Show resolved Hide resolved

Demuirgos reviewed Jan 7, 2026

View reviewed changes

benaadams marked this pull request as ready for review January 7, 2026 17:22

benaadams requested review from MarekM25 and rubo as code owners January 7, 2026 17:22

Copilot AI mentioned this pull request Jan 7, 2026

Optimize EVM stack push operations with SIMD and source generation #10137

Closed

6 tasks

benaadams changed the title ~~Optimize Stack Pushes~~ Optimize EVM stack push operations with source generation Jan 7, 2026

rubo reviewed Jan 8, 2026

View reviewed changes

src/Nethermind/Nethermind.Evm.SourceGenerators/Nethermind.Evm.SourceGenerators.csproj Show resolved Hide resolved

benaadams requested a review from Copilot January 8, 2026 19:43

Copilot started reviewing on behalf of benaadams January 8, 2026 19:43 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

benaadams force-pushed the optimize-stack-push branch from 87516fc to f9ab2eb Compare January 8, 2026 20:13

LukaszRozmej reviewed Jan 9, 2026

View reviewed changes

benaadams added 28 commits January 10, 2026 10:12

Formatting troll

46f55e2

Use source generator

889b9ea

Formatting

83ed996

Formatting

eafb5e3

Improves

a1bf340

Optimized

633f85e

Optimize

547caa2

Vector256.IsHardwareAccelerated checks for Arm

0048c0c

Use more string interpolation

80216ed

Use more string interpolation

e3761c5

formatting

ad12ca1

Return rather than throwing exception

5ecdbab

Return results

bb31a07

Improve exp

2b16a98

Push hash directly

9013c54

Improved asm

f7ca508

formatting

e80ae08

Better asm

6348606

Better asm

e845494

Better asm

88051a7

Better asm

e246156

Improve virtual machine loop

a77a8bf

Better vm loop

95bb5e4

Don't pass program counter by ref

dab6adf

Don't inline InterpreterLoop

129e605

Better prewarming

b0db753

Tighter RunByteCode

6bd316c

Tighter loop

ceec4f8

benaadams force-pushed the optimize-stack-push branch from 8255d30 to ceec4f8 Compare January 10, 2026 10:14

Optimize

06c5885

Optimize EVM stack push operations with source generation #10120

Are you sure you want to change the base?

Optimize EVM stack push operations with source generation #10120

Conversation

benaadams commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

LukaszRozmej commented Jan 6, 2026

Uh oh!

benaadams commented Jan 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Demuirgos Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

benaadams Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

benaadams commented Jan 7, 2026

Uh oh!

Copilot AI commented Jan 7, 2026

Uh oh!

Uh oh!

benaadams commented Jan 8, 2026

Uh oh!

benaadams commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

LukaszRozmej left a comment

Choose a reason for hiding this comment

Uh oh!

LukaszRozmej Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

benaadams commented Jan 6, 2026 •

edited

Loading