Skip to content

Optimize BN254Pairing call#9621

Merged
benaadams merged 25 commits intomasterfrom
bn254pairing
Nov 3, 2025
Merged

Optimize BN254Pairing call#9621
benaadams merged 25 commits intomasterfrom
bn254pairing

Conversation

@benaadams
Copy link
Copy Markdown
Member

@benaadams benaadams commented Nov 1, 2025

Changes

  • Remove double copy
  • Use intrinsic endian swap
  • Using mclBn_millerLoop inside the loop and one mclBn_finalExp at the end removes N‑1 final exponentiations, which are the expensive part. On BN254 that is a meaningful win at even modest N.

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • No

Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't mutate input

@benaadams
Copy link
Copy Markdown
Member Author

can't mutate input

Fixed

Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if it works :)

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Copy link
Copy Markdown
Contributor

@rubo rubo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from using Miller loop/final exp combo, other changes don't bring any performance benefits according to my benchmarks, just made the code less readable:

add

| Method | Mean     | Error    | StdDev   |
|------- |---------:|---------:|---------:|
| RunOld | 848.0 ns |  3.18 ns |  2.82 ns |
| RunNew | 855.4 ns | 17.02 ns | 18.92 ns |

mul

| Method | Mean     | Error    | StdDev   |
|------- |---------:|---------:|---------:|
| RunOld | 21.24 us | 0.238 us | 0.211 us |
| RunNew | 21.93 us | 0.437 us | 0.448 us |

@benaadams
Copy link
Copy Markdown
Member Author

Apart from using Miller loop/final exp combo, other changes don't bring any performance benefits according to my benchmarks, just made the code less readable:

Better?

Though it was more about BN254Pairing than Add or Mul

@benaadams benaadams requested a review from Copilot November 1, 2025 15:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the BN254 elliptic curve precompiles by improving memory management, implementing SIMD-accelerated byte reversal operations, and enhancing the pairing check algorithm. The changes focus on performance improvements while maintaining correctness.

Key changes:

  • Replaced ArrayPool and stackalloc-based memory management with direct byte arrays to reduce allocations and copies
  • Implemented hardware-accelerated byte reversal using AVX2, SSE, and Vector128 intrinsics with scalar fallback
  • Optimized pairing check to use Miller loop followed by single final exponentiation instead of full pairing per pair

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
BN254PairingPrecompile.cs Simplified memory management by removing ArrayPool usage and passing inputData.Span directly to BN254.CheckPairing
BN254MulPrecompile.cs Added named constants for input/output lengths and simplified output allocation
BN254AddPrecompile.cs Added named constants, implemented input padding logic to avoid unnecessary copies when input matches expected length
BN254.cs Major refactoring: changed signatures to use ReadOnlySpan, added SIMD-optimized byte reversal methods, optimized pairing check algorithm, and improved deserialization methods to work with pointers

@rubo
Copy link
Copy Markdown
Contributor

rubo commented Nov 1, 2025

Better?

Doesn't look like they made any difference

Though it was more about BN254Pairing than Add or Mul

Well, there are lots of changes bringing no performance gain but a negligible loss (at least on my machine)?

@benaadams
Copy link
Copy Markdown
Member Author

Added change to annotate some lightweight methods with SuppressGCTransition NethermindEth/mcl-bindings#8

@benaadams
Copy link
Copy Markdown
Member Author

benaadams commented Nov 2, 2025

Well, there are lots of changes bringing no performance gain but a negligible loss (at least on my machine)?

Better now? Removes one stackalloc entirely as well as .ToArray and moves other stackalloc out of common path

image

@benaadams
Copy link
Copy Markdown
Member Author

Though is more about BN254Pairing than Mul and Add

@rubo
Copy link
Copy Markdown
Contributor

rubo commented Nov 2, 2025

Overall, the situation is the following:

Pairing check:

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| 'Before PR' | 3.272 ms | 0.0349 ms | 0.0326 ms |
| 'After PR'  | 2.210 ms | 0.0297 ms | 0.0278 ms |

where almost all the improvement comes from the Miller loop/final exp combo:

| Method      | Mean     | Error     | StdDev    |
|------------ |---------:|----------:|----------:|
| 'Before PR' | 2.356 ms | 0.0362 ms | 0.0339 ms |
| 'After PR'  | 2.232 ms | 0.0400 ms | 0.0521 ms |

Thus, all other bells and whistles add just ~5% improvement.

For add:

| Method      | Mean     | Error   | StdDev  |
|------------ |---------:|--------:|--------:|
| 'Before PR' | 857.3 ns | 4.05 ns | 3.59 ns |
| 'After PR'  | 861.6 ns | 6.65 ns | 5.89 ns |

On each run, the changes are slower, although slightly.

The same for the mul:

| Method      | Mean     | Error    | StdDev   |
|------------ |---------:|---------:|---------:|
| 'Before PR' | 21.43 us | 0.235 us | 0.220 us |
| 'After PR'  | 22.06 us | 0.346 us | 0.324 us |

@rubo
Copy link
Copy Markdown
Contributor

rubo commented Nov 2, 2025

Though is more about BN254Pairing than Mul and Add

Then maybe better to leave them intact? They're clearly worse now.

Added change to annotate some lightweight methods with SuppressGCTransition

That's interesting, but that doesn't seem to make any difference in this particular case.

@benaadams
Copy link
Copy Markdown
Member Author

Though is more about BN254Pairing than Mul and Add

Then maybe better to leave them intact? They're clearly worse now.

Are you using 96 byte inputs for Mul and 128 byte inputs for Add in the benchmarks?

@benaadams
Copy link
Copy Markdown
Member Author

benaadams commented Nov 3, 2025

They're clearly worse now.

The difference for both is less than the error; the difference on Add is ~16 cpu cycles? (4.3 ns)

Copy link
Copy Markdown
Contributor

@Marchhill Marchhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please verify no hive test regressions

return true;
}

private static unsafe bool IsZero64(byte* ptr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these methods be moved to somewhere they can be reused, like byte array extensions or something

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure want to encourage pointers 😅

Will move if crops up again

benaadams and others added 2 commits November 3, 2025 11:26
Co-authored-by: Marc <Marchhill@users.noreply.github.com>
@benaadams
Copy link
Copy Markdown
Member Author

please verify no hive test regressions

Looks ok

@benaadams benaadams merged commit 0b510cd into master Nov 3, 2025
84 of 342 checks passed
@benaadams benaadams deleted the bn254pairing branch November 3, 2025 13:40
kamilchodola added a commit that referenced this pull request Nov 9, 2025
* Add more logging in MultiSyncModeSelector (#9616)

* Add more logging

* fix for seq

* feat: Add configurable EIP-2935 ring buffer size (#9611)

* Blockchain Engine Tests support (#9394)

* initial commit

* fix normal blockchain tests

* tidy

* restore disposes

* comment out BALs

* fix var declaration

* don't set basefeepergas if null

* use network from genesis in blockchain test

* update blockchain test base

* add tracer to blockchain tests runner

* tidy

* tidy

* add genesis processing timeout

* check for null head block

* try undo some changes

* detect failure to process genesis

* check removal is error

* add back checks for genesis spec

* only add noenginerequeststracker in tests

* comment sealed block check

* try remove timeout

* only configure merge for engine tests

* fix merge module init

* add back timeout and remove sealer

* await new payloads

* use reflection for engine rpc method calling

---------

Co-authored-by: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>

* use zero address when from address not specified in rpc calls (#9578)

* use zero address for null values

* small test

* fix proof rpc

* fix test and add more changes

* Allow serving snap requests for more than 128 blocks (#9602)

* Initial plan

* Add SnapServingMaxDepth configuration and update LastNStateRootTracker

Co-authored-by: tanishqjasoria <11698398+tanishqjasoria@users.noreply.github.com>

* Add clarifying comments for configuration changes

Co-authored-by: tanishqjasoria <11698398+tanishqjasoria@users.noreply.github.com>

* Get reorgDepth from config instead of hardcoding in test

Co-authored-by: LukaszRozmej <12445221+LukaszRozmej@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tanishqjasoria <11698398+tanishqjasoria@users.noreply.github.com>
Co-authored-by: Tanishq Jasoria <jasoriatanishq@gmail.com>
Co-authored-by: LukaszRozmej <12445221+LukaszRozmej@users.noreply.github.com>

* Remove console log from FileTestsSource (#9622)

Removed console log for loading test file.

* Correct docs value for Blocks.BlockProductionMaxTxKilobytes (#9620)

* Update OP Superchain chains (#9629)

Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>

* Auto-update fast sync settings (#9628)

Co-authored-by: rubo <rubo@users.noreply.github.com>

* feat: write AckMessage directly to IByteBuffer without temp array (#9623)

* Optimize Ripemd (#9627)

* Allow precompile cache to be switched off by config (#9633)

* Mainnet Osaka, BPO1, BPO2 forks (#9615)

* Change rlp limits and add logs (#9631)

* log rlp guard messages

* Adding stack trace

* Increase receipts limit to 1024

* fix for stack trace

* try fix multiline

* fix

* whitespace

* try fix

* More logs and potential fixes

* log fix

* Revert "More logs and potential fixes"This reverts commit ec71c87.# Conflicts:#	src/Nethermind/Nethermind.Network/MessageSerializationService.csRevert "log fix"This reverts commit b3b1f5f.

Revert "More logs and potential fixes"

This reverts commit ec71c87.

* try weird fix

* Revert "try weird fix"

This reverts commit 0bdfb3d.

* revert packages.json

* simplify log

* Don't abbreviate ForkchoiceStateV1 hashes

* revert spammy ForkchoiceStateV1

* Fix missed dispose on StorageRange in ProgressTracker

* fix test

* Optimise CALL by throwing stack underflow earlier (#9581)

* fail fast

* add unit tests

---------

Co-authored-by: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>

* Optimize BN254Pairing call (#9621)

* Optimize BN254Pairing call

* Don't modify input

* Skip locals

* Tweak inlining

* Update src/Nethermind/Nethermind.Evm.Precompiles/BN254.cs

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>

* Less fixed

* Optimize

* Optimize

* formatting

* formatting

* Tidy

* Use constants

* Feedback

* Skip init

* Faster

* More skip init

* Make mul ReadOnlySpan, add comments

* Simplify

* Skip locals

* Ruben making me work for it

* More working for it

* Still working for it

* Update benchmarks

* Apply suggestions from code review

Co-authored-by: Marc <Marchhill@users.noreply.github.com>

* Feedback

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Marc <Marchhill@users.noreply.github.com>

* Update OP Superchain chains (#9643)

Co-authored-by: LukaszRozmej <LukaszRozmej@users.noreply.github.com>

* Log/decrease noise (#9642)

* decrease Ethash cache miss log

* Decrease block downloader invalid bloc log

* Update X handle (#9634)

* Better logs on invalid orphan (#9641)

* Try fixing world wrong block

* not this, so keep validating Withdrawals

* Update all op chain configs (#9645)

* update configs

* update all configs

* Add `NetworkId` flag to `InitConfig` (#9476)

* add flag with networkId

* cosmetic

* Update src/Nethermind/Nethermind.Api/IInitConfig.cs

Co-authored-by: Ruben Buniatyan <rubo@users.noreply.github.com>

* fix

* # Conflicts:
#	src/Nethermind/Nethermind.Api/IInitConfig.cs
#	src/Nethermind/Nethermind.Api/InitConfig.cs

* Revert "# Conflicts:"

This reverts commit 7f25638.

* postmerge fix

* Apply suggestion from @benaadams

---------

Co-authored-by: Ruben Buniatyan <rubo@users.noreply.github.com>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>

* Revert "HACK: enable in fusaka"

This reverts commit 7cfad14.

* HACK: enable in pectra

---------

Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
Co-authored-by: Daniil Ankushin <ankushin.daniil42@gmail.com>
Co-authored-by: Marc <Marchhill@users.noreply.github.com>
Co-authored-by: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>
Co-authored-by: Tanishq Jasoria <jasoriatanishq@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tanishqjasoria <11698398+tanishqjasoria@users.noreply.github.com>
Co-authored-by: LukaszRozmej <12445221+LukaszRozmej@users.noreply.github.com>
Co-authored-by: Ben {chmark} Adams <thundercat@illyriad.co.uk>
Co-authored-by: core-repository-dispatch-app[bot] <173070810+core-repository-dispatch-app[bot]@users.noreply.github.com>
Co-authored-by: emlautarom1 <emlautarom1@users.noreply.github.com>
Co-authored-by: rubo <rubo@users.noreply.github.com>
Co-authored-by: VolodymyrBg <aqdrgg19@gmail.com>
Co-authored-by: LukaszRozmej <LukaszRozmej@users.noreply.github.com>
Co-authored-by: Marcin Sobczak <77129288+marcindsobczak@users.noreply.github.com>
Co-authored-by: Marcin Sobczak <marcindsobczak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants