Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 Performance regression in #279 #281

Open
github-actions bot opened this issue Jan 24, 2024 · 5 comments
Open

🚨 Performance regression in #279 #281

github-actions bot opened this issue Jan 24, 2024 · 5 comments

Comments

@github-actions
Copy link
Contributor

Regression >= 30.0% found during merge of: #279
Commit: 5da330e
Triggered by: https://github.com/lurk-lab/arecibo/actions/runs/7646169891

@huitseeker
Copy link
Contributor

Direct link to comment: 5da330e#commitcomment-137748136

I think that looking at the prior commit, db4375f, as well as a handful of commits before that, the numbers for the "before" baseline were spurious.

@samuelburnham I don't think this could have been a caching issue (unpacking a prior bench result that's not quite for the same machine), would it? As we don't have caching installed.

@huitseeker
Copy link
Contributor

huitseeker commented Feb 9, 2024

OK, so, I've run this benchmarking procedure on three interesting commits, which all resolve grumpkin to:
https://github.com/lurk-lab/grumpkin-msm?branch=dev#29af3cd2
Namely 5da330e which resolves to the commit of a#271, db4375f which resolves to its parent, and 6fde742, which is where the above regression was reproduced.

CompressedSNARK-NIVC-1

ref=5da330e-cuda ref=5da330e ref=6fde742-cuda ref=6fde742 ref=db4375f-cuda ref=db4375f
Prove-NumCons-6540 387.09 ms (✅ 1.00x) 385.93 ms (✅ 1.00x faster) 377.40 ms (✅ 1.03x faster) 386.35 ms (✅ 1.00x faster) 387.14 ms (✅ 1.00x slower) 385.18 ms (✅ 1.00x faster)
Verify-NumCons-6540 28.78 ms (✅ 1.00x) 28.81 ms (✅ 1.00x slower) 29.01 ms (✅ 1.01x slower) 28.97 ms (✅ 1.01x slower) 28.89 ms (✅ 1.00x slower) 28.91 ms (✅ 1.00x slower)

CompressedSNARK-NIVC-2

ref=5da330e-cuda ref=5da330e ref=6fde742-cuda ref=6fde742 ref=db4375f-cuda ref=db4375f
Prove-NumCons-6540 402.17 ms (✅ 1.00x) 397.98 ms (✅ 1.01x faster) 397.78 ms (✅ 1.01x faster) 404.92 ms (✅ 1.01x slower) 400.66 ms (✅ 1.00x faster) 405.44 ms (✅ 1.01x slower)
Verify-NumCons-6540 29.36 ms (✅ 1.00x) 29.94 ms (✅ 1.02x slower) 29.96 ms (✅ 1.02x slower) 29.66 ms (✅ 1.01x slower) 29.60 ms (✅ 1.01x slower) 29.46 ms (✅ 1.00x slower)

CompressedSNARK-NIVC-Commitments-2

ref=5da330e-cuda ref=5da330e ref=6fde742-cuda ref=6fde742 ref=db4375f-cuda ref=db4375f
Prove-NumCons-6540 9.49 s (✅ 1.00x) 6.64 s (✅ 1.43x faster) 9.38 s (✅ 1.01x faster) 6.66 s (✅ 1.43x faster) 6.63 s (✅ 1.43x faster) 6.65 s (✅ 1.43x faster)
Verify-NumCons-6540 58.11 ms (✅ 1.00x) 246.31 ms (❌ 4.24x slower) 57.48 ms (✅ 1.01x faster) 248.26 ms (❌ 4.27x slower) 246.51 ms (❌ 4.24x slower) 246.57 ms (❌ 4.24x slower)

@huitseeker
Copy link
Contributor

I tested on the penultimate commit of a#290, which contains no use of the MSM in the IPA (see the PR description for why).

This does tell us (as expected, see a#290) that using the GPU-MSM code path in the IPA (remember, this never resolves to an actual GPU MSM, just SN's CPU one) is faster in both cases (6s vs 9 w/o cuda, 9s vs 12 with), but that the proving / verifying discrepancy is due to stg else.

Benchmark Results

CompressedSNARK-NIVC-1

ref=c43fd4c ref=c43fd4c-cuda
Prove-NumCons-6540 559.73 ms (✅ 1.00x) 552.84 ms (✅ 1.01x faster)
Verify-NumCons-6540 28.94 ms (✅ 1.00x) 29.26 ms (✅ 1.01x slower)

CompressedSNARK-NIVC-2

ref=c43fd4c ref=c43fd4c-cuda
Prove-NumCons-6540 578.43 ms (✅ 1.00x) 571.01 ms (✅ 1.01x faster)
Verify-NumCons-6540 29.65 ms (✅ 1.00x) 29.71 ms (✅ 1.00x slower)

CompressedSNARK-NIVC-Commitments-2

ref=c43fd4c ref=c43fd4c-cuda
Prove-NumCons-6540 9.34 s (✅ 1.00x) 12.20 s (❌ 1.31x slower)
Verify-NumCons-6540 247.61 ms (✅ 1.00x) 57.40 ms (🚀 4.31x faster)

@huitseeker
Copy link
Contributor

TL;DR: this is not a regression, but rather a native property of the batched approach (introduced in #131) that it performs better w/o GPU acceleration. This is not due to the final (IPA, in our benches) commitment.

@huitseeker
Copy link
Contributor

Tested yesterday: removing parallelism in commitments on CUDA, no change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant