-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework batches in ppsnark #305
Conversation
2c46050
to
aed2791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Rust specific simplifications are indeed awesome!
I'm not sure if I understand why do you put benches results comparison for
ref=13ff737 / ref=7a5d7bf
as mentioned performance regression was detected for 5da330e, so it is natural to compare:
ref=5da330e (regression detected) / ref=aed2791 (last commit of this PR)
isn't it?
@storojs72 I have added more information to #281 to give context about the performance issue of Supernova proofs raised there. I would note that this is technically not a regression: all our benchmarks were on the GPU before, and #279 (joined with updates on the grumpkin-msm repo) is the first PR where we have seen the performance of SuperNova proofs w/o the GPU. Moreover, this PR is not aimed at solving the issue of Supernova proofs. It's simplifications I performed on the way towards addressing that issue. They should stand on their own. As you can see they are not addressing changes related to GPU operations. |
- Reworked vector and matrix multiplication operations for improved memory use in `sparse.rs`. - Modified `batch_eval_prove` function in `snark.rs` for more cleaner assertion checks related to polynomial sizes. - Enhanced `InnerSumcheckInstance` within `setup` function for parallel computation in `ppsnark.rs`.
- Optimized the transcript insertion process in `batched.rs` by replacing loop-absorbing RelaxedR1CSInstances with a single absorb statement
…axedR1CSSNARK - Refactored the structure of `claims_outer` in `BatchedRelaxedR1CSSNARK` from a tuple of vectors to a vector of tuples, streamlining its usage and unpacking. - Adjusted the `prove` and `verify` functions' code in accordance with the new `claims_outer` implementation. - Enhanced the extraction and manipulation of `ABCE_evals` in the `verify` function resulting in simpler and more efficient usage.
- Refined tau polynomial definitions using `PowPolynomial::squares` function to improve efficiency. - uses in verify the same efficient method as exists in proving.
- Introduced `#[must_use]` annotation to the `squares` function in `power.rs` to ensure the returned result is handled appropriately. - avoid re-computing powers of tau over and over in batched_ppsnark
aed2791
to
238ad43
Compare
} | ||
}) | ||
.collect::<Result<Vec<_>, _>>()?; | ||
let S = S.iter().map(|s| s.pad()).collect::<Vec<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we skip is_regular_shape
checks. Isn't this a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the pad
function is to return things with regular shape. It already short-circuits when the input has that regular shape.
.map(|((v_a, v_b), v_c)| *v_a + c * *v_b + c * c * *v_c) | ||
.collect::<Vec<E::Scalar>>(); | ||
let val = zip_with!( | ||
par_iter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curious - how does it work with parallel iterator if indices of valA
, valB
, valC
should correspond? (if I understand this correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parallel iterator one gets from vectors implements IndexedParallelIterator, which captures this "matching indices" guarantee.
TL;DR
This is streamlining of the batched snark/ppsnark made on the way to resolving the perf difference found in #281
Differential bench (with
--features "cuda"
) for this PR reveals high variance 🤷Benchmark Results
CompressedSNARK-NIVC-1
ref=13ff737
ref=7a5d7bf
Prove-NumCons-6540
378.03 ms
(✅ 1.00x)384.75 ms
(✅ 1.02x slower)Verify-NumCons-6540
29.71 ms
(✅ 1.00x)29.00 ms
(✅ 1.02x faster)Prove-NumCons-1038732
9.42 s
(✅ 1.00x)9.42 s
(✅ 1.00x slower)Verify-NumCons-1038732
132.44 ms
(✅ 1.00x)115.03 ms
(✅ 1.15x faster)CompressedSNARK-NIVC-2
ref=13ff737
ref=7a5d7bf
Prove-NumCons-6540
384.48 ms
(✅ 1.00x)388.90 ms
(✅ 1.01x slower)Verify-NumCons-6540
31.16 ms
(✅ 1.00x)29.68 ms
(✅ 1.05x faster)Prove-NumCons-1038732
9.73 s
(✅ 1.00x)10.03 s
(✅ 1.03x slower)Verify-NumCons-1038732
182.09 ms
(✅ 1.00x)149.88 ms
(✅ 1.21x faster)CompressedSNARK-NIVC-Commitments-2
ref=13ff737
ref=7a5d7bf
Prove-NumCons-6540
9.30 s
(✅ 1.00x)9.41 s
(✅ 1.01x slower)Verify-NumCons-6540
57.65 ms
(✅ 1.00x)57.70 ms
(✅ 1.00x slower)Prove-NumCons-1038732
70.85 s
(✅ 1.00x)70.12 s
(✅ 1.01x faster)Verify-NumCons-1038732
299.04 ms
(✅ 1.00x)300.82 ms
(✅ 1.01x slower)