Skip to content

einar/pr/extended.jacobian.coordinates#1

Draft
einar-taiko wants to merge 10 commits into
taikoxyz:taiko/unstablefrom
einar-taiko:einar/pr/extended.jacobian.coordinates
Draft

einar/pr/extended.jacobian.coordinates#1
einar-taiko wants to merge 10 commits into
taikoxyz:taiko/unstablefrom
einar-taiko:einar/pr/extended.jacobian.coordinates

Conversation

@einar-taiko

Copy link
Copy Markdown

This resolves taikoxyz/zkevm-circuits#121

To measure the performance impact:

cd /tmp
git clone https://github.com/einar-taiko/halo2curves.git
cd halo2curves
git checkout b09144e
cargo bench multiexp
git checkout einar/pr/extended.jacobian.coordinates
cargo bench multiexp

@einar-taiko einar-taiko requested a review from mratsim August 21, 2023 10:01
@mratsim

mratsim commented Aug 22, 2023

Copy link
Copy Markdown

I get even less improvement than previously reported which is quite strange

image

git clone https://github.com/taikoxyz/halo2curves taiko-halo2curves
cd taiko-halo2curves/
git fetch origin pull/1/head
git checkout 9692b29
cargo bench multiexp
git checkout -b review-pr1 FETCH_HEAD
cargo bench multiexp

Comment thread src/multiexp.rs
});
sum
})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mratsim mratsim left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EC implementation is correct with a small issue on inline usage that will lead to code explosion.

The multiexp implementation should be done by just modifying the buckets accumulation scheme in https://github.com/privacy-scaling-explorations/halo2/blob/f3487575f00e627705fcb7778d7d1049eed79601/halo2_proofs/src/arithmetic.rs#L67-L71

Also, there are 2 passes over the coeffs+base instead of 1 in this PR which should explain most of the slowness compared to theoretical (15%) and conservative estimate (10% speedup).

Comment thread Cargo.toml Outdated
Comment thread src/derive/curve.rs

// Jacobian implementations

// From affine to homogeneous

@mratsim mratsim Aug 23, 2023

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it from affine to homogeneous or from affine to Jacobian?

This seems to contradict the comment above

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I added the comment because I found it confusing too, since $name refers to homogeneous coordinates and $name_affine to affine coordinates. Alternatively, I can remove the first comment, since non-extended Jacobian coordinates should be gone anyway.

Comment thread src/derive/curve.rs Outdated
let zzz1 = p.zzz;

// curve constants
let a = Self::curve_constant_a();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.

If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least a comment is needed to point out an future easy optimization opportunity

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.

If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0

I have changed it to:

// curve constants
const A: $base = $name_jac_ext::curve_constant_a();

This should enforce compile time evaluation and hopefully imply that

let m = (x1_sqr.double()+x1_sqr) + A*zz1.square();

gets optimized to

let m = (x1_sqr.double()+x1_sqr) 

for the curves where $a=0$. But I am currently not sure how to verify this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use if A == 0 and if A != 0.

The multiplication and addition operations are pure assembly and will not be optimized away.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with -opt-level=2 here https://godbolt.org/z/4Gr41o8Kr

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with -opt-level=2 here https://godbolt.org/z/4Gr41o8Kr

I realise that * is overloaded for field arithmetic, so my argument may or may not be applicable.

Comment thread src/derive/curve.rs Outdated
let w = u*v;
let s = x1 * v;
let x1_sqr = x1.square();
let m = (x1_sqr.double()+x1_sqr) + a*zz1.square();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a = 0, a*zz1.square() can be skipped

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried commenting out the last term and then ran cargo test. All tests pass.

I put in a debug_assert_eq!(a, 0);, so if we can catch future curves where a != 0.

Comment thread src/derive/curve.rs
}

/// <http://www.hyperelliptic.org/EFD/g1p/auto-shortw-xyzz.html#doubling-dbl-2008-s-1>
#[inline]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline] is likely not worth it here. Especially because the underlying field operations are already inline so it will lead to code size explosion:

#[inline]
pub fn double(&self) -> $field {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will remove it from the add as well then.

Comment thread src/multiexp.rs
let num_threads = num_cpus::get();
if n > num_threads && n > 32 {
let chunk = n / num_threads;
let results: Vec<C::ExtendedJacobianCoordinates> = coeffs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the inner buckets result should be collected in Extended Jacobian.

Extended Jacobian mixed addition is faster than projective, but for plain addition, projective is faster.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the signature of best_multiexp should stay with projective coordinates

Comment thread benches/multiexp.rs Outdated
})
.sample_size(30);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To measure perf of this PR without multithreading interference, a serial benchmark is needed as well.

Comment thread src/multiexp.rs
}
}

pub(crate) fn multiexp_serial<C: CurveJacExt>(coeffs: &[C::Scalar], bases: &[C]) -> C::ExtendedJacobianCoordinates {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, the signature of multiexp_serial should stay homogeneous projective

Comment thread src/multiexp.rs
.enumerate()
.rev()
.map(|(i, bucket)| {
for (coeff, base) in coeffs.iter().zip(bases.iter()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map and for are inefficient, you loop twice over the data.

You should loop over the coeff+base and directly mutate/add_assign in the matching bucket

Comment thread src/multiexp.rs
}
bucket
})
.fold(C::jac_ext_identity(), |mut sum, bucket| {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final reduction should use projective coordinates (and explicitly convert the bucket sum to projective)

@einar-taiko einar-taiko self-assigned this Aug 23, 2023
@mratsim

mratsim commented Aug 29, 2023

Copy link
Copy Markdown

Note on our discussion on constant_a, some more changes upstream privacy-ethereum#82.

@einar-taiko einar-taiko force-pushed the einar/pr/extended.jacobian.coordinates branch from d52078b to 747c6e2 Compare August 30, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extended Jacobian coordinates

2 participants