Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompile #4

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Precompile #4

wants to merge 11 commits into from

Conversation

0xWOLAND
Copy link

@0xWOLAND 0xWOLAND commented Jul 10, 2024

Currently, bls12-381 implements CPU-optimized Fp multiplication. However, this is quite slow inside of SP1. Instead, we should use a precompile to accelerate Fp-type multiplication.

@0xWOLAND 0xWOLAND self-assigned this Jul 10, 2024
@0xWOLAND 0xWOLAND requested a review from puma314 July 10, 2024 00:40
src/fp.rs Outdated
Comment on lines 648 to 649
sys_fp_bigint(&mut result, 0, lhs, rhs, modulus);
sys_fp_bigint(&mut result, 0, &result, r_inv, modulus);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because Fp uses a Montgomery representation, so we have to multiply by r_inv to convert a normal number back into Montgomery form. Here's the cost of doing this here versus in the precompile:

Montgomery Conversion in bls12-381

┌╴fp-square    
└╴942 cycles    
┌╴fp-mul    
└╴1,022 cycles    
┌╴fp2-square    
└╴2,471 cycles    
┌╴fp2-mul    
└╴3,761 cycles    
┌╴fp6-square    
└╴24,385 cycles   
┌╴fp6-mul    
└╴37,067 cycles   
┌╴fp12-square    
└╴91,731 cycles   
┌╴fp12-mul    
└╴128,033 cycles  

Montgomery Conversion in an SP1 Precompile

┌╴fp-square    
└╴731 cycles    
┌╴fp-mul    
└╴811 cycles    
┌╴fp2-square    
└╴2,055 cycles   
┌╴fp2-mul    
└╴2,955 cycles   
┌╴fp6-square    
└╴21,594 cycles  
┌╴fp6-mul    
└╴29,443 cycles  
┌╴fp12-square    
└╴76,539 cycles  
┌╴fp12-mul    
└╴104,920 cycles 

So putting this in the precompile makes it marginally faster. I computed the Montgomery conversion in the bls12-381 crate to make verify_kzg_proof 15% more cycles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we move the Montgomery conversion logic into the precompile, the total cycle count for verify_kzg_proof is 95,222,885 cycles. If we crudely do two multiplications (as above), it is 111,284,472 cycles.

Cargo.toml Outdated
@@ -46,6 +50,13 @@ version = "0.13"
default-features = false
optional = true

[dependencies.sp1-zkvm]
git = "https://github.com/succinctlabs/sp1.git"
branch = "bhargav/fp-precompile"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant