Skip to content

Bw6 miller loop optimization#617

Merged
Pratyush merged 14 commits intomasterfrom
bw6-optimization
Sep 4, 2023
Merged

Bw6 miller loop optimization#617
Pratyush merged 14 commits intomasterfrom
bw6-optimization

Conversation

@mmagician
Copy link
Copy Markdown
Member

Description

As per Algorithm 5 in https://eprint.iacr.org/2020/351.pdf.

TODO: check whether the inverse should apply to f_u or f_1 in case the loop count is negative.
@yelhousni maybe you could help?

Benchmarks:
Before:

Pairing for BW6_761/G2 Preparation for BW6_761
                        time:   [732.74 µs 734.78 µs 737.28 µs]
Pairing for BW6_761/Miller Loop for BW6_761
                        time:   [2.5692 ms 2.5765 ms 2.5844 ms]
Pairing for BW6_761/Full Pairing for BW6_761
                        time:   [6.6391 ms 6.6557 ms 6.6730 ms]

After:

Pairing for BW6_761/G2 Preparation for BW6_761
                        time:   [591.47 µs 592.55 µs 593.67 µs]
Pairing for BW6_761/Miller Loop for BW6_761
                        time:   [2.0311 ms 2.0358 ms 2.0406 ms]
Pairing for BW6_761/Full Pairing for BW6_761
                        time:   [5.9286 ms 5.9474 ms 5.9669 ms]

full pairing is ~11% faster.

closes: #616


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (master)
  • Linked to GitHub issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the GitHub PR explorer

@mmagician mmagician requested review from a team as code owners March 11, 2023 15:48
@mmagician mmagician requested review from Pratyush and weikengchen and removed request for a team March 11, 2023 15:48
@Pratyush Pratyush enabled auto-merge March 11, 2023 17:35
Copy link
Copy Markdown
Member

@weikengchen weikengchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check if it is equivalent to the original one?

@mmagician
Copy link
Copy Markdown
Member Author

We can probably remove the check for whether 2nd Miller loop is negative. The parameter X^2 - X - 1 > 0, unless P::X=1 I don't know enough about curve construction, but I guess this will never happen.
Also we could basically reuse P::X, and its sign, to determine the value and sign of the first loop count. Maybe to even find the 2-NAF of the 2nd count with a const fn. What's standing in the way is that X is currently defined as Bigint. Maybe in a next PR I can refactor the BW6 parameters, ideally removing these 4 parameters that are already known/can be computed at compile.

@mmagician
Copy link
Copy Markdown
Member Author

@weikengchen They're indeed equivalent. You can check by substituting the right branches into: https://github.com/mmagician/bw6-comparison.

auto-merge was automatically disabled March 19, 2023 21:23

Merge queue setting changed

@swasilyev swasilyev mentioned this pull request Mar 27, 2023
6 tasks
@Pratyush
Copy link
Copy Markdown
Member

Pratyush commented Sep 1, 2023

Hm the tests seem to be failing right now

@mmagician
Copy link
Copy Markdown
Member Author

@Pratyush The parameters for loop counts changed, so the tests will fail against the current master of arkworks/curves. I've updated both sides today though, and they are all green - I can add a temp patch to Cargo here to point to the compatible branch in curves.
However, one thing to reconsider (as pointed out to me by @swasilyev at some point) is that even though this algorithm is better for a single pairing, it seems that with a multipairing of 3 the methods are equivalent, and likely worsening the performance for 4+, see the last table in https://hackmd.io/@gnark/BW6-761-changes.
There seems to be one method (called 6' in that hackmd) which is both backwards compatible, faster for single and for multi-pairings. I could implement that, but won't be able to do it very soon. WDYT, how likely are we going to do multi-pairings of size 4+ vs. single pairings?

@Pratyush
Copy link
Copy Markdown
Member

Pratyush commented Sep 2, 2023

I think multi-pairings often with some frequency, especially for batch-verification.

@mmagician mmagician mentioned this pull request Sep 2, 2023
6 tasks
@mmagician
Copy link
Copy Markdown
Member Author

I ran some benchmarks, and it seems that the new version from this PR outperforms the current master, even for multi-pairings.

Some numbers for BW6-767:

optimized:

Pairing for BW6_767/Multi Pairing for BW6_767 with 1 pairs
                        time:   [3.9839 ms 4.0180 ms 4.0593 ms]
Pairing for BW6_767/Multi Pairing for BW6_767 with 5 pairs
                        time:   [8.7305 ms 8.7505 ms 8.7706 ms]
Pairing for BW6_767/Multi Pairing for BW6_767 with 10 pairs
                        time:   [14.508 ms 14.563 ms 14.617 ms]

master:

Pairing for BW6_767/Multi Pairing for BW6_767 with 1 pairs
                        time:   [4.4910 ms 4.5152 ms 4.5490 ms]
Pairing for BW6_767/Multi Pairing for BW6_767 with 5 pairs
                        time:   [11.000 ms 11.046 ms 11.092 ms]
Pairing for BW6_767/Multi Pairing for BW6_767 with 10 pairs
                        time:   [18.986 ms 19.064 ms 19.145 ms]

Where the improvement % actually increases with number of pairs!

Similar results for BW6-761:

optimized:

Pairing for BW6_761/Multi Pairing for BW6_761 with 1 pairs
                        time:   [3.5187 ms 3.5259 ms 3.5333 ms]
Pairing for BW6_761/Multi Pairing for BW6_761 with 5 pairs
                        time:   [8.1297 ms 8.2073 ms 8.3008 ms]
Pairing for BW6_761/Multi Pairing for BW6_761 with 10 pairs
                        time:   [13.810 ms 13.860 ms 13.911 ms]

master:

Pairing for BW6_761/Multi Pairing for BW6_761 with 1 pairs
                        time:   [3.9132 ms 3.9271 ms 3.9413 ms]
Pairing for BW6_761/Multi Pairing for BW6_761 with 5 pairs
                        time:   [9.9494 ms 9.9683 ms 9.9876 ms]
Pairing for BW6_761/Multi Pairing for BW6_761 with 10 pairs
                        time:   [17.237 ms 17.273 ms 17.309 ms]

I'm not quite sure what to make of this given the notes in https://hackmd.io/@gnark/BW6-761-changes. I suppose it comes down to the implementation differences across the gnark & arkworks at some different level in the stack.

In any case, I think the numbers suggest we can safely call this PR an improved version.


let f_u_inv;

// TODO: is it enough to get the inverse of f_1, or does f_u also need to get inverted?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to address these TODOs before merging?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I missed this. The TODO no longer makes much sense actually - now it's saying exactly the opposite of what's happening in the code. I think this might have been a leftover of some of my initial attempts to understand the original code, which was indeed inverting f_1 (although at that point, there was no such thing as f_u, so I'm not so sure of that theory now). Now the ATE_LOOP_COUNT_1 is only used for computing f_u, so if the count was negative, I think only f_u should be affected.
The good thing is that now we have two curves with opposite sign for ATE_LOOP_COUNT_1 - so at least both arms of this if/else are tested.
I'll remove the TODO.

Comment on lines +151 to +157
if bit == 1 {
f *= &f_u;
for &mut (p, ref mut coeffs) in pairs.iter_mut() {
BW6::<Self>::ell(&mut f, &coeffs.next().unwrap(), &p.0);
}
} else if bit == -1 {
f *= &f_u_inv;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loops in both cases are the same, so they can be extracted to outside the loop, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, addressed in 2557df1

@mmagician mmagician requested a review from Pratyush September 4, 2023 12:43
ell_coeffs_1.push(r.add_in_place(&q));
}
}
// TODO: this is probably the slowest part
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave this TODO in, it's potential for even more future optimizations

@Pratyush Pratyush merged commit 9ea43e9 into master Sep 4, 2023
@Pratyush Pratyush deleted the bw6-optimization branch September 4, 2023 13:36
aleasims pushed a commit to NilFoundation/arkworks-algebra that referenced this pull request Oct 18, 2023
aleasims added a commit to NilFoundation/arkworks-algebra that referenced this pull request Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize BW6 Miller loop

3 participants