-
Notifications
You must be signed in to change notification settings - Fork 598
chore: pippenger int audit #19302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
chore: pippenger int audit #19302
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
d163891
initial clean up
iakovenkos 75ef963
recursive -> iterative
iakovenkos 2f9c59d
add first approximation of docs + rm redundant alias
iakovenkos 45e1a64
tackle issue 1449
iakovenkos 79e17e2
tackle issue 1449
iakovenkos 6e24ce6
Merge remote-tracking branch 'origin/merge-train/barretenberg' into s…
iakovenkos 3bd30ca
small refactor
iakovenkos 7ef589a
reapply centralized montgomery conversion
iakovenkos 9a771a6
clean up
iakovenkos 6072169
get_offset_generator out of the loop
iakovenkos 47cae52
revert some branching
iakovenkos e6f9c8f
fixing magic constants + reusing existing stuff
iakovenkos 794a038
more const updates
iakovenkos 37c3d8b
introduce point schedule entry
iakovenkos a43c966
consolidated --> nonzero_scalar_indices
iakovenkos cc17b30
clean up get_work_units
iakovenkos 22f583b
batch msm clean up
iakovenkos 8916b60
evaluate_pippenger_round mutates in-place instead of returning confus…
iakovenkos ac4f0ab
use uint32_t where possible
iakovenkos 5e8cae1
unfold recursion
iakovenkos c487e40
use common helper to process buckets
iakovenkos c8142f0
share logic to produce single point edge case
iakovenkos 8f0dbfc
rm redundant args
iakovenkos f3d3a28
stray comment
iakovenkos 724ca97
check regression
iakovenkos a2c4a5a
centralize Montgomery conversion in filtering function
iakovenkos 4a59df3
restore iterative consume_point_schedule (cleaner than recursive)
iakovenkos 129eb22
iterative
iakovenkos 1200dab
more docs and renaming
iakovenkos b074916
brush up tests
iakovenkos f9e088b
another docs iteration
iakovenkos 7fe4f71
docs+naming
iakovenkos 6ac8e94
clean up processing functions
iakovenkos 9ba1080
better org
iakovenkos 50c6f88
fix docs discrepancies
iakovenkos 3e33312
make docs concise
iakovenkos de82341
upd hpp
iakovenkos 8dc83f7
fix build, fix montgomery conversion regression
iakovenkos 806e2de
rm funny inclusion
iakovenkos 53b6501
Merge branch 'merge-train/barretenberg' into si/pippenger-audit-0
iakovenkos ff7f410
fix ivc integration test?
iakovenkos 256770d
change bench script
iakovenkos 108da69
fix multithreading
iakovenkos 0aaa930
rm benches
iakovenkos 40de9d5
fix perf regression
iakovenkos f1eff36
md fix
iakovenkos 15b9521
fix build
iakovenkos 113a58a
Merge remote-tracking branch 'origin/merge-train/barretenberg' into s…
iakovenkos e5d0055
move scalar slicing back to pippenger
iakovenkos 65c92dc
address more comments
iakovenkos 6c3dcfa
Merge remote-tracking branch 'origin/merge-train/barretenberg' into s…
iakovenkos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
183 changes: 183 additions & 0 deletions
183
barretenberg/cpp/src/barretenberg/ecc/scalar_multiplication/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| # Pippenger Multi-Scalar Multiplication (MSM) | ||
|
|
||
| ## Overview | ||
|
|
||
| The Pippenger algorithm computes multi-scalar multiplications: | ||
|
|
||
| $$\text{MSM}(\vec{s}, \vec{P}) = \sum_{i=0}^{n-1} s_i \cdot P_i$$ | ||
|
|
||
| **Complexity**: Let $q = \lceil \log_2(\text{field modulus}) \rceil$ be the scalar bit-length, $|A|$ the cost of a group addition, and $|D|$ the cost of a doubling. | ||
|
|
||
| - **Pippenger**: $O\left(\frac{q}{c} \cdot \left((n + 2^c) \cdot |A| + c \cdot |D|\right)\right)$ | ||
| - **Naive**: $O(n \cdot q \cdot |D| + n \cdot q \cdot |A| / 2)$ | ||
|
|
||
| With $c \approx \frac{1}{2} \log_2 n$, Pippenger achieves roughly $O(n \cdot q / \log n)$ vs $O(n \cdot q)$ for naive scalar multiplication. | ||
|
|
||
| ## Algorithm | ||
|
|
||
| ### Step 1: Scalar Decomposition | ||
|
|
||
| **Implementation**: `get_scalar_slice(scalar, round_index, bits_per_slice)` | ||
|
|
||
| Each scalar $s_i$ is decomposed into $r$ slices of $c$ bits each, processed **MSB-first**: | ||
|
|
||
| $$s_i = \sum_{j=0}^{r-1} s_i^{(j)} \cdot 2^{c(r-1-j)}$$ | ||
|
|
||
| - $c$ = bits per slice (from `get_optimal_log_num_buckets`, which brute-force searches for minimum cost) | ||
| - $r = \lceil $ `NUM_BITS_IN_FIELD` $/ c \rceil$ = number of rounds | ||
| - Round 0 extracts the most significant bits | ||
|
|
||
| ### Step 2: Bucket Accumulation | ||
|
|
||
| For each round $j$, points are added into **buckets** based on their scalar slice. Bucket $k$ accumulates all points whose slice value equals $k$: | ||
|
|
||
| $$B_k^{(j)} = \sum_{\{i : s_i^{(j)} = k\}} P_i$$ | ||
|
|
||
| **Two implementation paths:** | ||
|
|
||
| - **Affine**: Sorts points by bucket and uses batched affine additions | ||
| - **Jacobian**: Direct bucket accumulation in Jacobian coordinates | ||
|
|
||
| ### Step 3: Bucket Reduction | ||
|
|
||
| **Implementation**: `accumulate_buckets(bucket_accumulators)` | ||
|
|
||
| Computes weighted sum using a suffix sum (high to low): | ||
|
|
||
| $$R^{(j)} = \sum_{k=1}^{2^c - 1} k \cdot B_k^{(j)} = \sum_{k=1}^{2^c - 1} \left( \sum_{m=k}^{2^c - 1} B_m^{(j)} \right)$$ | ||
|
|
||
| An offset generator is added and subtracted to avoid rare accumulator edge cases—a probabilistic mitigation that simplifies accumulation logic. | ||
|
|
||
| ### Step 4: Round Combination | ||
|
|
||
| Combines all rounds using Horner's method (MSB-first): | ||
|
|
||
| ```cpp | ||
| msm_accumulator = point_at_infinity | ||
| for j = 0 to r-1: | ||
| repeat c doublings (or fewer for final round) | ||
| msm_accumulator += bucket_result[j] | ||
| ``` | ||
|
|
||
| ## Algorithm Variants | ||
|
|
||
| ### Entry Points and Safety | ||
|
|
||
| | Entry Point | Default | Safety | | ||
| |-------------|---------|--------| | ||
| | `msm()` | `handle_edge_cases=false` | ⚠️ **Unsafe** | | ||
| | `pippenger()` | `handle_edge_cases=true` | ✓ Safe | | ||
| | `pippenger_unsafe()` | `handle_edge_cases=false` | ⚠️ Unsafe | | ||
| | `batch_multi_scalar_mul()` | `handle_edge_cases=true` | ✓ Safe | | ||
|
|
||
| ### Edge Cases | ||
|
|
||
| Affine addition fails for **P = Q** (doubling), **P = −Q** (inverse), and **P = O** (identity). Jacobian coordinates handle these correctly at higher cost (~2-3× slower). | ||
|
|
||
| ⚠️ **Use `msm()` or `pippenger_unsafe()` only when points are guaranteed linearly independent** (e.g., SRS points). For user-controlled or potentially duplicate points, use `pippenger()`. | ||
|
|
||
| ### Affine Pippenger (`handle_edge_cases=false`) | ||
|
|
||
| Uses affine coordinates with Montgomery's batch inversion trick: replaces $m$ inversions with **1 inversion + O(m) multiplications**, yielding ~2-3× speedup over Jacobian. | ||
|
|
||
| ### Jacobian Pippenger (`handle_edge_cases=true`) | ||
|
|
||
| Uses Jacobian coordinates for bucket accumulators. Handles all edge cases correctly. | ||
|
|
||
| ## Tuning Constants | ||
|
|
||
| | Constant | Value | Purpose | | ||
| |----------|-------|---------| | ||
| | `PIPPENGER_THRESHOLD` | 16 | Below this, use naive scalar multiplication | | ||
| | `AFFINE_TRICK_THRESHOLD` | 128 | Below this, batch inversion overhead exceeds savings | | ||
| | `MAX_SLICE_BITS` | 20 | Upper bound on bucket count exponent | | ||
| | `BATCH_SIZE` | 2048 | Points per batch inversion (fits L2 cache) | | ||
| | `RADIX_BITS` | 8 | Bits per radix sort pass | | ||
|
|
||
| <details> | ||
| <summary>Cost model constants and derivations</summary> | ||
|
|
||
| | Constant | Value | Derivation | | ||
| |----------|-------|------------| | ||
| | `BUCKET_ACCUMULATION_COST` | 5 | 2 Jacobian adds/bucket × 2.5× cost ratio | | ||
| | `AFFINE_TRICK_SAVINGS_PER_OP` | 5 | ~10 muls saved − ~3 muls for product tree | | ||
| | `JACOBIAN_Z_NOT_ONE_PENALTY` | 5 | Extra field ops when Z ≠ 1 | | ||
| | `INVERSION_TABLE_COST` | 14 | 4-bit lookup table for modular exp | | ||
|
|
||
| **BATCH_SIZE=2048**: Each `AffineElement` is 64 bytes. 2048 points = 128 KB, fitting in L2 cache. | ||
|
|
||
| **RADIX_BITS=8**: 256 radix buckets × 4 bytes = 1 KB counting array, fits in L1 cache. | ||
|
|
||
| </details> | ||
|
|
||
| ## Implementation Notes | ||
|
|
||
| ### Zero Scalar Filtering | ||
|
|
||
| `transform_scalar_and_get_nonzero_scalar_indices` filters out zero scalars before processing (since $0 \cdot P_i = \mathcal{O}$). Scalars are converted from Montgomery form in-place to avoid doubling memory usage. | ||
|
|
||
| ### Bucket Existence Tracking | ||
|
|
||
| A `BitVector` bitmap tracks which buckets are populated, avoiding expensive full-array clears between rounds. Clearing the bitmap costs $O(2^c / 64)$ words vs $O(2^c)$ for the full bucket array. | ||
|
|
||
| ### Point Scheduling (Affine Variant Only) | ||
|
|
||
| Entries are packed as `(point_index << 32) | bucket_index` into 64-bit values. Since bucket indices fit in $c$ bits (typically 8-16), they occupy only the lowest bits of the packed entry. An **in-place MSD radix sort** on the low $c$ bits groups points by bucket for efficient batch processing. The sort also detects entries with `bucket_index == 0` during the final radix pass, allowing zero-bucket entries to be skipped without a separate scan. | ||
|
|
||
| ### Batched Affine Addition | ||
|
|
||
| `batch_accumulate_points_into_buckets` processes sorted points iteratively: | ||
| - Same-bucket pairs → queue for batch addition | ||
| - Different buckets → cache in bucket or queue with existing accumulator | ||
| - Uses branchless conditional moves to minimize pipeline stalls | ||
| - Prefetches future points to hide memory latency | ||
| - Recirculates results to maximize batch efficiency before writing to buckets | ||
|
|
||
| <details> | ||
| <summary>Batch accumulation case analysis</summary> | ||
|
|
||
| | Condition | Action | Iterator Update | | ||
| |-----------|--------|-----------------| | ||
| | `bucket[i] == bucket[i+1]` | Queue both points for batch add | `point_it += 2` | | ||
| | Different buckets, accumulator exists | Queue point + accumulator | `point_it += 1` | | ||
| | Different buckets, no accumulator | Cache point into bucket | `point_it += 1` | | ||
|
|
||
| After batch addition, results targeting the same bucket are paired again before writing to bucket accumulators, reducing random memory access by ~50%. | ||
|
|
||
| </details> | ||
|
|
||
| ## Parallelization | ||
|
|
||
| Uses **per-thread buffers** (bucket accumulators, scratch space) to eliminate contention. | ||
|
|
||
| For `batch_multi_scalar_mul()`, work is distributed via `MSMWorkUnit` structures that can split a single MSM across multiple threads. Each thread computes partial results on point subsets, combined in a final reduction. | ||
|
|
||
| <details> | ||
| <summary>Per-call buffer sizes</summary> | ||
|
|
||
| | Buffer | Size | Purpose | | ||
| |--------|------|---------| | ||
| | `BucketAccumulators` (affine) | $2^c × 64$ bytes | Affine bucket array + bitmap | | ||
| | `JacobianBucketAccumulators` | $2^c × 96$ bytes | Jacobian bucket array + bitmap | | ||
| | `AffineAdditionData` | ~400 KB | Scratch for batch inversion | | ||
| | `point_schedule` | $n × 8$ bytes | Per-MSM point schedule | | ||
|
|
||
| Buffers are allocated per-call for WASM compatibility. Memory scales with thread count during parallel execution. | ||
|
|
||
| </details> | ||
|
|
||
| ## File Structure | ||
|
|
||
| ``` | ||
| scalar_multiplication/ | ||
| ├── scalar_multiplication.hpp # MSM class, data structures | ||
| ├── scalar_multiplication.cpp # Core algorithm | ||
| ├── process_buckets.hpp/cpp # Radix sort | ||
| ├── bitvector.hpp # Bit vector for bucket tracking | ||
| └── README.md # This file | ||
| ``` | ||
|
|
||
| ## References | ||
|
|
||
| 1. Pippenger, N. (1976). "On the evaluation of powers and related problems" | ||
| 2. Bernstein, D.J. et al. "Faster batch forgery identification" (batch inversion) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docs, improved the situation with magic numbers