-
Notifications
You must be signed in to change notification settings - Fork 624
Optimize EVM stack push operations with source generation #10120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes EVM stack push operations by replacing generic Span.CopyTo calls with specialized, size-specific copy implementations that leverage SIMD instructions (Vector256/Vector128) for better performance.
Key changes:
- Introduced specialized push methods (
PushRightPaddedBytes,PushBothPaddedBytes) with optimized byte packing logic - Added helper methods (
PackHiU64,PackLoU64,CopyUpTo32) for efficient small-size data copying - Replaced ternary conditional pushes with explicit method calls (
PushZero,PushOne) for clearer semantics - Ensured proper memory alignment via
AsAlignedSpanfor warmup scenarios
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| EvmStack.cs | Core optimization: refactored PushBytes and introduced specialized push methods with SIMD-optimized byte packing for both left-padded and right-padded scenarios |
| EvmInstructions.Storage.cs | Optimized CALLDATALOAD to use specialized PushRightPaddedBytes instead of generic zero-padding, improving performance for common call data operations |
| EvmInstructions.Stack.cs | Updated PUSH operations to use renamed PushBothPaddedBytes method, maintaining correctness for edge cases where immediate data is truncated |
| EvmInstructions.Environment.cs | Simplified BLOCKHASH to use explicit PushZero instead of conditional with BytesZero32, improving code clarity |
| EvmInstructions.Call.cs | Optimized successful empty call path by using PushOne instead of pushing StatusCode bytes |
| VirtualMachine.Warmup.cs | Added alignment guarantees via AsAlignedSpan to ensure stack operations can safely use SIMD instructions |
|
Any benchmarks? |
Going deeper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.
src/Nethermind/Nethermind.Evm.SourceGenerators/StackPushBytesGenerator.cs
Outdated
Show resolved
Hide resolved
| }); | ||
| } | ||
|
|
||
| private static string? EmitForType(SourceProductionContext spc, INamedTypeSymbol? type, Candidate[] methods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt there a way to have a template? it would provide a clearer idea on the structure of the emitted type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@copilot summarise the changes for PR body text |
|
@benaadams I've opened a new pull request, #10137, to work on those changes. Once the pull request is ready, I'll request review from you. |
src/Nethermind/Nethermind.Evm.SourceGenerators/Nethermind.Evm.SourceGenerators.csproj
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated no new comments.
87516fc to
f9ab2eb
Compare
LukaszRozmej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source generation seems overcomplicated to me. Can't we just do it by hand?
Looking at generated methods they all have similar pattern.
Top part is differs by one param - the count
if (TTracingInst.IsActive)
{
_tracer.TraceBytes(in value, 16); // <- count goes here
}
uint headOffset = (uint)Head;
uint newOffset = headOffset + 1;
ref Vector256<byte> head = ref Unsafe.As<byte, Vector256<byte>>(ref Unsafe.Add(ref MemoryMarshal.GetReference(_bytes), (nint)(headOffset * WordSize)));
if (newOffset >= MaxStackSize)
{
return EvmExceptionType.StackOverflow;
}
Head = (int)newOffset;
while bottom part is more complicated and has more variations, example:
if (Vector256.IsHardwareAccelerated)
{
head = Vector256.Create(
0UL,
(ulong)Unsafe.ReadUnaligned<ushort>(ref value) << 48,
Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 2)),
Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 10))
).AsByte();
}
else
{
ref Vector128<ulong> head128 = ref Unsafe.As<Vector256<byte>, Vector128<ulong>>(ref head);
head128 = Vector128.Create(
0UL,
(ulong)Unsafe.ReadUnaligned<ushort>(ref value) << 48
);
Unsafe.Add(ref head128, 1) = Vector128.Create(
Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 2)),
Unsafe.ReadUnaligned<ulong>(ref Unsafe.Add(ref value, 10))
);
}
but can be broken down to simple stuff:
- for V256 - we create one vector that's it
- for V128 we basically interpret something as 128 then create 2 other 128's.
This code could be made generic statics with those and we could extract and inline all of this Something like:
public partial EvmExceptionType PushBytes<TOp, TOpTTracingInst>(ref byte value)
where TTracingInst : struct, global::Nethermind.Core.IFlag
where TOp : IOpCount // or something else
{
if (TTracingInst.IsActive)
{
_tracer.TraceBytes(in value, TOp.Count);
}
uint headOffset = (uint)Head;
uint newOffset = headOffset + 1;
ref Vector256<byte> head = ref Unsafe.As<byte, Vector256<byte>>(ref Unsafe.Add(ref MemoryMarshal.GetReference(_bytes), (nint)(headOffset * WordSize)));
if (newOffset >= MaxStackSize)
{
return EvmExceptionType.StackOverflow;
}
Head = (int)newOffset;
if (Vector256.IsHardwareAccelerated)
{
head = TOp.Create256Vector();
}
else
{
Unsafe.Add(ref head128, 1) = TOp.Create128Vector()
}
return EvmExceptionType.None;
}
pass the correct params, aggressive inline those and you are done without all the obfuscation of code generation
| public static EvmExceptionType Push<TTracingInst>(int length, ref EvmStack stack, int programCounter, ReadOnlySpan<byte> code) | ||
| where TTracingInst : struct, IFlag | ||
| { | ||
| throw new NotSupportedException($"Use the {nameof(InstructionPush2)} opcode instead"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
8255d30 to
ceec4f8
Compare




Changes
Source Generator Implementation
StackPushBytesGeneratorandGenerateStackOpcodeGeneratorto auto-generate optimized push methods for byte sizes 1-32[GenerateStackPushBytes(size, PadDirection)]attributeSIMD Stack Operations
Span.CopyTowith directVector256<byte>/Vector128<byte>constructionCopyUpTo32helper using unaligned SIMD reads for optimal byte copyingVM Execution Loop
nuintindexing with function pointersAPI Changes
EvmExceptionTypeinstead of void for unified error handlingPushBytesNullableReffor stack overflow detection without exceptionsPushZero/PushOnemethods replace conditional pushesBefore:
After:
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
Documentation
Requires documentation update
Requires explanation in Release Notes