Skip to content

Move SAR, SHR and SHL to UInt256#10216

Merged
lu-pinto merged 13 commits intobesu-eth:mainfrom
lu-pinto:shift-opcodes-alt-design
Apr 16, 2026
Merged

Move SAR, SHR and SHL to UInt256#10216
lu-pinto merged 13 commits intobesu-eth:mainfrom
lu-pinto:shift-opcodes-alt-design

Conversation

@lu-pinto
Copy link
Copy Markdown
Contributor

@lu-pinto lu-pinto commented Apr 10, 2026

PR description

I was honestly not convinced with the current design of the opcodes on EVM V2 from #10154 so I did some experimentation and I would like to challenge the existing design.
I managed to achieve the same performance level while splitting up duties between the arithmetic/bitwise computations from the opcodes themselves. Opcodes should be the ones fetching/updating the stack, and not the code that does computations - this should be strictly decoupled from one another.

IMO code looks much cleaner and easier to read. It also benefits from code reuse with already existing arithmetics in UInt256. I will take a look at repurposing shl and shr for modulus arithmetics in another PR as well as I believe we might be able to reuse them.

Performance stats:

Test Case Latency (ns) main@2d4f077c27 Latency (ns) @065670ffe3
SarV2_SHIFT_06.0496.334
SarV2_NEGATIVE_SHIFT_18.6818.83
SarV2_POSITIVE_SHIFT_17.9698.312
SarV2_ALL_BITS_SHIFT_18.5188.804
SarV2_NEGATIVE_SHIFT_1286.7967.223
SarV2_NEGATIVE_SHIFT_2556.9967.546
SarV2_POSITIVE_SHIFT_1286.7567.101
SarV2_POSITIVE_SHIFT_2556.7656.959
SarV2_OVERFLOW_SHIFT_2566.8487.222
SarV2_OVERFLOW_LARGE_SHIFT6.9547.356
SarV2_FULL_RANDOM15.34915.379
ShlV2_SHIFT_05.7856.362
ShlV2_SHIFT_18.4928.778
ShlV2_SHIFT_1287.1497.105
ShlV2_SHIFT_2556.8717.277
ShlV2_OVERFLOW_SHIFT_2566.6477.698
ShlV2_OVERFLOW_LARGE_SHIFT6.8327.798
ShlV2_FULL_RANDOM11.9278.183
ShrV2_SHIFT_05.8176.357
ShrV2_SHIFT_17.7428.233
ShrV2_SHIFT_1286.8336.975
ShrV2_SHIFT_2556.8247.061
ShrV2_OVERFLOW_SHIFT_2566.6427.69
ShrV2_OVERFLOW_LARGE_SHIFT6.8057.846
ShrV2_FULL_RANDOM11.3818.628

Issue(s)

#10131

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests
  • hive tests: Engine or other RPCs modified?

@lu-pinto lu-pinto requested review from ahamlat and siladu and removed request for siladu April 10, 2026 09:23
@lu-pinto lu-pinto force-pushed the shift-opcodes-alt-design branch from da22999 to 82337b0 Compare April 10, 2026 09:24
public static OperationResult staticOperation(final MessageFrame frame) {
if (!frame.stackHasItems(2)) return UNDERFLOW_RESPONSE;
frame.setTopV2(StackArithmetic.sar(stack, frame.stackTopV2()));
long[] _stack = frame.stackDataV2();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to just use stack instead of _stack to be inline with the naming used in the project.

Comment on lines +51 to +52
if (!frame.stackHasItems(2)) return UNDERFLOW_RESPONSE;
frame.setTopV2(StackArithmetic.shl(stack, frame.stackTopV2()));
long[] _stack = frame.stackDataV2();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to just use stack instead of _stack to be inline with the naming used in the project.

public static OperationResult staticOperation(final MessageFrame frame) {
if (!frame.stackHasItems(2)) return UNDERFLOW_RESPONSE;
frame.setTopV2(StackArithmetic.shr(stack, frame.stackTopV2()));
long[] _stack = frame.stackDataV2();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to just use stack instead of _stack to be inline with the naming used in the project.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is a leftover from the previous method argument before I removed it

|| shift.u2() != 0
|| shift.u1() != 0
|| Long.compareUnsigned(shift.u0(), 256) >= 0) {
bytesToShift = 256;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is bitsToShift ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use bitShift in private methods below

Copy link
Copy Markdown
Contributor Author

@lu-pinto lu-pinto Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is bits, well spotted

|| shift.u2() != 0
|| shift.u1() != 0
|| Long.compareUnsigned(shift.u0(), 256) >= 0) {
bytesToShift = 256;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.

return new UInt256(w3, w2, w1, w0);
}

private static long shiftLeftWord(final long value, final long nextValue, final int bitShift) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc.

return (value << bitShift) | (nextValue >>> (64 - bitShift));
}

private static long shiftRightWord(final long value, final long prevValue, final int bitShift) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc.

final long[] s = new long[8];
writeLimbs(s, 0, valueVal);
writeLimbs(s, 4, shiftVal);
final UInt256 result = executor.execute(valueVal, shiftVal);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 (this is a good argument that this design is better)

Copy link
Copy Markdown
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposed design, I find it better and the code much cleaner. There is a small performance regression, could you double check if it is real with multiple runs and investigate the origin.

Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto lu-pinto force-pushed the shift-opcodes-alt-design branch from 82337b0 to f4ed77d Compare April 10, 2026 13:21
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto
Copy link
Copy Markdown
Contributor Author

I like the proposed design, I find it better and the code much cleaner. There is a small performance regression, could you double check if it is real with multiple runs and investigate the origin.

Looked into it and optimised a little more - but I'm going to park it here. Worst cases (FULL_RANDOM) are much closer or have improved significantly. IMO these are prob the most realistic ones.
The other ones are very hard to get better numbers without impacting the worse case because I primarily optimized for it.

Arguments.of(
"0x8000000000000000000000000000000000000000000000000000000000000000",
"0x100",
"0x0100",
Copy link
Copy Markdown
Contributor

@ahamlat ahamlat Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do make this change and all the changes below on unit tests ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it related not using anymore fromHexStringLenient ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed fromHexStringLenient to fromHexString in the test to make the hexadecimal exact without having to guess if there will be a zero or not prepended. Hard to know if you don't know what lenient does. Since we are providing the values hardcoded does it make sense to "disguise" them? For instance 0x0 is half a byte so it seems lenient would put a zero to complete the byte.
I can revert it if you feel strongly about it.

@lu-pinto lu-pinto mentioned this pull request Apr 10, 2026
Copy link
Copy Markdown
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can maintain same performance despite UInt256 allocations then that's fine by me.

Like the reuse between sar and shl, just need to update name and javadoc.

Various minor comments.

public Operation.OperationResult executeFixedCostOperation(
final MessageFrame frame, final EVM evm) {
return staticOperation(frame, frame.stackDataV2());
return staticOperation(frame);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

stack[shiftOffset + 1],
stack[shiftOffset + 2],
stack[shiftOffset + 3]);
final int valueOffset = (--top) << 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially took issue with the mutability of this versus (top - 2) << 2, but warmed to it once I realised
--top === pop
++top === push
which is pretty neat.

* @param fill value to prepend while shifting
* @return the result
*/
// TODO: check perf - wiring shiftRight callers with this one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PR is approved, just pointing out there's still a TODO

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to clutter this PR with that, but also want to track this for changing in a follow up.

* @param shift number of bits to shift (must be in [1, 255])
* @return the result
*/
// TODO: check perf - wiring shiftLeft callers with this one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

} else {
bitShift = (int) shift.u0();
}
return sar0(bitShift, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd for shr to call a sar method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? I think this is elegant. SAR is pretty much a SHR with a custom fill.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So couldn't you equally call it shr0 ?

sar and shr imply specific things, their shared code is not sar.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, I like the structure just not the name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree there, because a shift with a custom fill is an arithmetic shift so it makes sense for SHR to call an internal SAR with a static fill of 0.

Copy link
Copy Markdown
Contributor

@siladu siladu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By your description, I think you are saying shl === sar0, but sar0 only makes sense to me when parameter is 0, e.g.

UInt256 sar0(int bitShift) {
  return sar(bitShift, 0);
}

The method declaration
private UInt256 sar0(final int shift, final long fill) {
allows custom fill param so if anything should be sar not sar0.
However, I still find this confusing since it isn't the same as SAR opcode. Both SHR and SAR are using it, so better to call it something different IMO.

* @return the result
*/
// TODO: check perf - wiring shiftLeft callers with this one
private UInt256 shl0(final int shift) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private UInt256 shl0(final int shift) {
private UInt256 shiftLeft(final int shift) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this clashes with the current shiftLeft method already in place, hence I put the TODO to address this in a follow up. I do what to move onto those names.

Copy link
Copy Markdown
Contributor

@siladu siladu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have two versions of public shiftLeft after this PR?
Is UInt256 now a mix of V1 and V2 variants?
shiftLeftV2 would make more sense if that's the case IMO, or maybe we need UInt256V2 class though would rather not.

I think handling the TODOs in this PR would make things clearer tbh, it's not a large PR.

Comment thread evm/src/test/java/org/hyperledger/besu/evm/v2/operation/SarOperationV2Test.java Outdated
* Arithmetic right-shifts a 256-bit value in place by 0..255 bits, sign-extending with {@code
* fill}.
*
* @param shift number of bits to shift (must be in [1, 255])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this comment is wrong now - it's handling 0 and 256 as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're right. That's why I don't like comments in private methods. They frequently change and easily get outdated and compiler does not catch them.

Copy link
Copy Markdown
Contributor

@siladu siladu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude /loop 1 week "check comments still match code" 😆

bitShift = 256;
} else {
bitShift = (int) shift.u0();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider

Suggested change
}
}
if (bitShift == 0) return this;

?

Copy link
Copy Markdown
Contributor Author

@lu-pinto lu-pinto Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so - I tried multiple things. These methods are heavily optimized, I would recommend you try and see the results - sometimes it may seem like a fast path but it changes how JIT sees the method.

Comment thread evm/src/test/java/org/hyperledger/besu/evm/v2/operation/ShlOperationV2Test.java Outdated
Signed-off-by: Luis Pinto <luis.pinto@consensys.net>
@lu-pinto lu-pinto force-pushed the shift-opcodes-alt-design branch from c2b7347 to 23bfc88 Compare April 13, 2026 11:16
@lu-pinto lu-pinto changed the title Alternative design for shift opcodes Move SAR, SHR and SHL to UInt256 Apr 13, 2026
@lu-pinto lu-pinto enabled auto-merge (squash) April 13, 2026 15:35
@lu-pinto lu-pinto merged commit 8e4ef80 into besu-eth:main Apr 16, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants