-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile #4
base: main
Are you sure you want to change the base?
Precompile #4
Conversation
src/fp.rs
Outdated
sys_fp_bigint(&mut result, 0, lhs, rhs, modulus); | ||
sys_fp_bigint(&mut result, 0, &result, r_inv, modulus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Fp uses a Montgomery representation, so we have to multiply by r_inv
to convert a normal number back into Montgomery form. Here's the cost of doing this here versus in the precompile:
Montgomery Conversion in bls12-381
┌╴fp-square
└╴942 cycles
┌╴fp-mul
└╴1,022 cycles
┌╴fp2-square
└╴2,471 cycles
┌╴fp2-mul
└╴3,761 cycles
┌╴fp6-square
└╴24,385 cycles
┌╴fp6-mul
└╴37,067 cycles
┌╴fp12-square
└╴91,731 cycles
┌╴fp12-mul
└╴128,033 cycles
Montgomery Conversion in an SP1 Precompile
┌╴fp-square
└╴731 cycles
┌╴fp-mul
└╴811 cycles
┌╴fp2-square
└╴2,055 cycles
┌╴fp2-mul
└╴2,955 cycles
┌╴fp6-square
└╴21,594 cycles
┌╴fp6-mul
└╴29,443 cycles
┌╴fp12-square
└╴76,539 cycles
┌╴fp12-mul
└╴104,920 cycles
So putting this in the precompile makes it marginally faster. I computed the Montgomery conversion in the bls12-381
crate to make verify_kzg_proof
15% more cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we move the Montgomery conversion logic into the precompile, the total cycle count for verify_kzg_proof
is 95,222,885 cycles. If we crudely do two multiplications (as above), it is 111,284,472 cycles.
Cargo.toml
Outdated
@@ -46,6 +50,13 @@ version = "0.13" | |||
default-features = false | |||
optional = true | |||
|
|||
[dependencies.sp1-zkvm] | |||
git = "https://github.com/succinctlabs/sp1.git" | |||
branch = "bhargav/fp-precompile" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to succinctlabs/sp1#1075
Currently, bls12-381 implements CPU-optimized Fp multiplication. However, this is quite slow inside of SP1. Instead, we should use a precompile to accelerate Fp-type multiplication.