Optimize EIP-196 AltBn128 EcAdd#301
Conversation
1. Memory Pooling for Performance
Introduces sync.Pool objects to reuse allocations and reduce garbage collection pressure:
- bigIntPool - reuses big.Int allocations for scalar operations
- g1Pool / g2Pool - reuses elliptic curve point allocations
- bytes64Pool - reuses 64-byte buffer allocations
2. Simplified Error Handling
- Before: Functions returned error strings passed through buffers between Go and Java
- After: Functions return integer error codes (0 for success, 1-8 for various errors)
- Removes overhead of string allocation and copying across JNI boundary
3. Streamlined JNI Interface
Changes function signatures from:
func eip196altbn128G1Add(input, output, errorBuf *C.char, inputLen C.int,
outputLen, errorLen *C.int) C.int
To:
func eip196altbn128G1Add(input, output *C.char, inputLen C.int) errorCode
4. Optimized Field Validation
- Removes manual field checking (checkInFieldEIP196 function)
- Uses SetBytesCanonical() which performs validation internally
- Eliminates redundant modulus comparisons
5. Direct Encoding
- g1AffineEncode now works directly with point objects using RawBytes()
- Eliminates intermediate Marshal() allocations
6. Reduced Buffer Sizes
- Result buffer size reduced from 128 to 64 bytes (only needs to hold one G1 point)
- Removes 256-byte error buffer entirely
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Co-authored-by: Ivo Kubjas <ivo.kubjas@consensys.net>
The pairing function now writes results (0x01 or 0x00) directly to the output buffer and only returns error codes for actual errors, eliminating the previous hack of using an error code to represent a valid pairing result of zero. Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
| inputBytes.length, | ||
| output); | ||
|
|
||
| if (errorCode != LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS) { |
There was a problem hiding this comment.
can we call err_code_success return_code_success?
There was a problem hiding this comment.
we could, but it stems from this set of related go consts and it's idiomatic to share the same prefix, so would need to change them all to returnCode and most of them are indeed errorCodes https://github.com/hyperledger/besu-native/pull/301/files#diff-9622b17a1165cbfa1780cbc92d116bcbbcb4136daf03dd3d0aa4f9d77373a2ddR35-R41
I'm leaning towards keeping unless you feel strongly to change them all to returnCode...?
ivokub
left a comment
There was a problem hiding this comment.
I also recommend updating gnark-crypto dependency to v0.19.2 (most recent). Most concretely, it contains optimizations for scalar multiplication in case scalars are small.
For a lot of use-cases it can provide significant speedup (Consensys/gnark-crypto#703). It will be less evident due to JNI.
To update:
cd gnark/gnark-jni
go get github.com/consensys/gnark-crypto@v0.19.2
go mod tidyI built and ran unit tests locally and tests pass. I didn't run evmtool.
Otherwise, the changes look good - I think passing directly the pairing return value is better with my previous approach (by passing it through error code).
| assertThat(errorCode).isEqualTo(LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS); | ||
| // The key test: byte 31 should have been written by Go code (either 0x00 or 0x01, not 0xFF) | ||
| assertThat(output[31]).isNotEqualTo((byte) 0xFF); | ||
| assertThat(output[31]).isIn((byte) 0x00, (byte) 0x01); |
There was a problem hiding this comment.
Should we also check that rest is 0x00?
garyschulte
left a comment
There was a problem hiding this comment.
LGTM, one safety concern highlighted
gnark/src/main/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
14b9a14 to
d43a197
Compare
|
Rerun of benchmark with gnark-crypto v0.19.2 bump Seems to give a small improvement to EcAdd (~1 MGas/s) and mul, but not pairings. Not checked if the benching include the small scalars that might benefit the most. |
| if !g1.IsOnCurve() { | ||
| return errCodePointOnCurveCheckFailedEIP196 |
There was a problem hiding this comment.
fwiw, if we are trying to eke out additional performance, the isOnCurve() checks are duplicated by SetBytesCanonical since gnark-crypto 0.17.0. Doing duplicate checks were kept out of an abundance of caution. see #262 (comment) for context
it is worth at least testing without the duplicate isOnCurve check to determine if the impact is negligible enough to keep for "visibility" reasons within the code.
There was a problem hiding this comment.
Yep, that's on my list but would rather keep this as incremental as possible - would save that for another PR.
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
98f5b5a to
b807d30
Compare
|
New benchmark with the latest code, i.e. the added length check. Maybe a very slight reduction in throughput, probably cancelled out the gnark-crypto upgrade 😁 |
garyschulte
left a comment
There was a problem hiding this comment.
see safety comment, otherwise LGTM
gnark/src/main/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196.java
Outdated
Show resolved
Hide resolved
Clarifying comments Changelog Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
gnark/gnark-jni/gnark-eip-196.go
Outdated
| inputLen := int(cInputLen) | ||
| errorLen := (*int)(unsafe.Pointer(cErrorLen)) | ||
| outputLen := (*int)(unsafe.Pointer(cOutputLen)) | ||
| var bytes64Pool = sync.Pool{ |
There was a problem hiding this comment.
not sure using a sync.Pool to allocate a fixed size array makes sense. Just declaring it in the function it is used (as an array, not a slice):
var a [64]byte
will allocate it on the stack, and will be more performant than using sync.Pool (sync mechanism + allocated on the heap).
gnark/gnark-jni/gnark-eip-196.go
Outdated
| return new(big.Int) | ||
| }, | ||
| } | ||
| var g1Pool = sync.Pool{ |
There was a problem hiding this comment.
similar comment than for the [64]byte array pool; G1 and G2 have known fix sizes, allocating directly in the function will likely be more performant.
Pool would be useful for the big.Int that would end up anyway on the heap.
|
I removed the pools for G1/G2/bytearray. For mul and pairing there is no difference, but EcAdd is somewhat faster (5%-ish): diff --git a/gnark/gnark-jni/gnark-eip-196.go b/gnark/gnark-jni/gnark-eip-196.go
index fcf5711..b280f19 100644
--- a/gnark/gnark-jni/gnark-eip-196.go
+++ b/gnark/gnark-jni/gnark-eip-196.go
@@ -52,22 +52,6 @@ var bigIntPool = sync.Pool{
return new(big.Int)
},
}
-var g1Pool = sync.Pool{
- New: func() any {
- return new(bn254.G1Affine)
- },
-}
-var g2Pool = sync.Pool{
- New: func() any {
- return new(bn254.G2Affine)
- },
-}
-
-var bytes64Pool = sync.Pool{
- New: func() any {
- return [64]byte{}
- },
-}
var EIP196ScalarTwo = big.NewInt(2)
@@ -82,33 +66,29 @@ func eip196altbn128G1Add(javaInputBuf, javaOutputBuf *C.char, cInputLen C.int) e
input := (*[2 * EIP196PreallocateForG1]byte)(unsafe.Pointer(javaInputBuf))[:inputLen:inputLen]
// generate p0 g1 affine
- p0 := g1Pool.Get().(*bn254.G1Affine)
- defer g1Pool.Put(p0)
-
- if err := safeUnmarshalEIP196(p0, input, 0); err != errCodeSuccess {
+ var p0 bn254.G1Affine
+ if err := safeUnmarshalEIP196(&p0, input, 0); err != errCodeSuccess {
return err
}
if inputLen < 2*EIP196PreallocateForG1 {
// if incomplete input is all zero, return p0
if isAllZeroEIP196(input, 64, 64) {
- g1AffineEncode(p0, javaOutputBuf)
+ g1AffineEncode(&p0, javaOutputBuf)
return errCodeSuccess
}
}
// generate p1 g1 affine
- p1 := g1Pool.Get().(*bn254.G1Affine)
- defer g1Pool.Put(p1)
-
- if err := safeUnmarshalEIP196(p1, input, 64); err != errCodeSuccess {
+ var p1 bn254.G1Affine
+ if err := safeUnmarshalEIP196(&p1, input, 64); err != errCodeSuccess {
return err
}
// Use the Add method to combine points
- p0.Add(p0, p1)
+ p0.Add(&p0, &p1)
// marshal the resulting point and encode directly to the output buffer
- g1AffineEncode(p0, javaOutputBuf)
+ g1AffineEncode(&p0, javaOutputBuf)
return errCodeSuccess
}
@@ -150,9 +130,8 @@ func eip196altbn128G1Mul(javaInputBuf, javaOutputBuf *C.char, cInputLen C.int) e
scalarBytes := input[EIP196PreallocateForG1:]
if 96 > int(cInputLen) {
// if the input is truncated, copy the bytes to the high order portion of the scalar
- bytes64 := bytes64Pool.Get().([64]byte)
- defer bytes64Pool.Put(bytes64)
- scalarBytes = bytes64[:32]
+ var bytes32 [32]byte
+ scalarBytes = bytes32[:]
copy(scalarBytes[:], input[64:int(cInputLen)])
}
@@ -246,9 +225,8 @@ func safeUnmarshalEIP196(g1 *bn254.G1Affine, input []byte, offset int) errorCode
return errCodeSuccess
} else if len(input)-offset < 64 {
// If we have some input, but it is incomplete, pad with zero
- bytes64 := bytes64Pool.Get().([64]byte)
- defer bytes64Pool.Put(bytes64)
- pointBytes = bytes64[:64]
+ var bytes64 [64]byte
+ pointBytes = bytes64[:]
shortLen := len(input) - offset
copy(pointBytes, input[offset:len(input)])
for i := shortLen; i < 64; i++ { |
update the macos GHA runners since macos-13 runners are now deprecated Signed-off-by: garyschulte <garyschulte@gmail.com>
1770a46 to
e882805
Compare
Signed-off-by: garyschulte <garyschulte@gmail.com>
|
@gbotrel @ivokub @garyschulte Removing the small object pooling is 6-15% improvement for ecadd cases 👍 And agree no improvement to mul or pairing compared to last iteration. Though looks like mul improved ~8% after gnark-crypto v0.19.2 |
Changes
Core optimizations by @ivokub
Selective Pooling Strategy:
Changes function signatures from:
To:
Safety improvement: Added Java-side buffer size validation to prevent JVM crashes from undersized output buffers.
Added 32 new tests covering:
Results
Before this PR
This PR
Performance Improvements Summary
Key Improvements:
Overall EC Addition improvement range: 33-138% 🏅
Benchmark Details
Testing