Parallelize Lagrange constraints evaluation #316

Al-Kindi-0 · 2024-09-13T13:42:10Z

Improves Lagrange constraints evaluation. This improves the end-to-end LogUp-GKR benchmark by 17-18% on my machine.

irakliyk · 2024-09-14T08:02:17Z

Thank you! Not a review yet, but a couple of questions:

Improves Lagrange constraints evaluation by 17-18% on my machine.

What can I ran to test the effect on my machine?

Also, this seems quite a bit less than what we get by parallelizing general constraint evaluation (I think it is something like 4x or 5x on your machine). Are there big parts still left to address for Lagrange kernel constraints? Or is there something inherently different here?

Al-Kindi-0 · 2024-09-14T08:42:08Z

The command I used are:

cargo bench --bench logup_gkr
cargo bench --features concurrent --bench logup_gkr

The improvement is for the overall benchmark, which is end-to-end. As constraint evaluation is a small percentage, I think the improvement to Lagrange constraint evaluation is probably on the order you gave.
I updated the original comment as it was misleading.

Edit: did some back of the envelop calculations and the actual improvement should probably be around 1.5x.

irakliyk · 2024-09-16T07:09:20Z

prover/src/constraints/evaluator/logup_gkr.rs

+        #[cfg(feature = "concurrent")]
+        combined_evaluations_acc
+            .par_iter_mut()
+            .enumerate()
+            .fold(


One potential issue with this approach is poor memory locality resulting of basically random assignment of data to various threads. An alternative would be to do something similar to what we with regular constraint evaluation:

First, we split up the trace into adjacent fragments. The number of fragments would be equal to the number of threads.

Then we evaluate each fragment in a different thread (e.g., see here).

I'm not 100% sure this will yield a significant improvement, but given that we are seeing only 1.5x overall improvement, I think this may be worth a try.

irakliyk · 2024-09-16T07:16:18Z

prover/src/constraints/evaluator/logup_gkr.rs

+                    trace.read_lagrange_kernel_frame_into(
+                        step << lde_shift,
+                        l_col_idx,
+                        &mut lagrange_frame,
+                    );


One potential thing to check is whether using Vec::push() inside this method is slowing things down for some reason. It shouldn't but also we usually use direct element assignment for things like this.

Al-Kindi-0 · 2024-09-18T16:27:35Z

Subsumed by #317

Al-Kindi-0 added 4 commits September 13, 2024 13:44

feat: add e2e benchmark for LogUp-GKR

b598dc5

chore: clean up benches in sum-check crate

c520b21

chore: increase size of trace

5aaa1e4

feat: parallelize Lagrange constraint evaluation

5132e16

facebook-github-bot added the cla signed label Sep 13, 2024

fix: clippy

1918864

irakliyk reviewed Sep 16, 2024

View reviewed changes

Al-Kindi-0 mentioned this pull request Sep 16, 2024

Parallelize Lagrange constraints evaluation #317

Merged

irakliyk closed this Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Lagrange constraints evaluation #316

Parallelize Lagrange constraints evaluation #316

Al-Kindi-0 commented Sep 13, 2024 •

edited

Loading

irakliyk commented Sep 14, 2024

Al-Kindi-0 commented Sep 14, 2024 •

edited

Loading

irakliyk Sep 16, 2024

irakliyk Sep 16, 2024

Al-Kindi-0 commented Sep 18, 2024

Parallelize Lagrange constraints evaluation #316

Parallelize Lagrange constraints evaluation #316

Conversation

Al-Kindi-0 commented Sep 13, 2024 • edited Loading

irakliyk commented Sep 14, 2024

Al-Kindi-0 commented Sep 14, 2024 • edited Loading

irakliyk Sep 16, 2024

Choose a reason for hiding this comment

irakliyk Sep 16, 2024

Choose a reason for hiding this comment

Al-Kindi-0 commented Sep 18, 2024

Al-Kindi-0 commented Sep 13, 2024 •

edited

Loading

Al-Kindi-0 commented Sep 14, 2024 •

edited

Loading