-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize Lagrange constraints evaluation #316
Parallelize Lagrange constraints evaluation #316
Conversation
Thank you! Not a review yet, but a couple of questions:
What can I ran to test the effect on my machine? Also, this seems quite a bit less than what we get by parallelizing general constraint evaluation (I think it is something like 4x or 5x on your machine). Are there big parts still left to address for Lagrange kernel constraints? Or is there something inherently different here? |
The command I used are:
The improvement is for the overall benchmark, which is end-to-end. As constraint evaluation is a small percentage, I think the improvement to Lagrange constraint evaluation is probably on the order you gave. Edit: did some back of the envelop calculations and the actual improvement should probably be around 1.5x. |
#[cfg(feature = "concurrent")] | ||
combined_evaluations_acc | ||
.par_iter_mut() | ||
.enumerate() | ||
.fold( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue with this approach is poor memory locality resulting of basically random assignment of data to various threads. An alternative would be to do something similar to what we with regular constraint evaluation:
- First, we split up the trace into adjacent fragments. The number of fragments would be equal to the number of threads.
- Then we evaluate each fragment in a different thread (e.g., see here).
I'm not 100% sure this will yield a significant improvement, but given that we are seeing only 1.5x overall improvement, I think this may be worth a try.
trace.read_lagrange_kernel_frame_into( | ||
step << lde_shift, | ||
l_col_idx, | ||
&mut lagrange_frame, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential thing to check is whether using Vec::push()
inside this method is slowing things down for some reason. It shouldn't but also we usually use direct element assignment for things like this.
Subsumed by #317 |
Improves Lagrange constraints evaluation. This improves the end-to-end LogUp-GKR benchmark by 17-18% on my machine.