Multiplication/exponentiation speed-ups #34

fjarri · 2023-09-09T15:57:51Z

Currently, most of the time in the signing protocol is spent in Montgomery exponentiation. Key refresh is split between exponentiation and prime number generation, but the latter is mainly exponentiation again (most of the time is spent in Miller-Rabin tests). So it would help a lot if the exponentiation performance is improved.

Possible avenues:

Replace schoolbook multiplication with Karatsuba or Toom-Cook. This may start making a difference at our integer sizes (2048 bit). This has to be done within crypto-bigint, see Improve multiplication RustCrypto/crypto-bigint#66
Use wNAF exponentiation instead of the current fixed-window one (for the cases where the exponent is not secret). This has to be done within crypto-bigint.
crypto-bigint's pow() supports exponents of arbitrary size (that is you can raise Uint<N> into Uint<M> power). We currently only raise Uint<N> to Uint<N>, and implement Uint<N>^Uint<2*N> and Uint<N>^Uint<4*N> by breaking the exponent in halves and exponentiating separately. If we could use the arbitrary size exponentiation, it could make this faster, because we would not have to calculate x^{2^N} separately to merge the halves - it's already calculated by the fixed window algorithm.
In some places where we calculate x^y mod N we also know phi(N) (the totient), so we can instead calculate x^(y mod phi(N)) mod N. If y is large (of the order of N^2), this may be faster than direct exponentiation.

The text was updated successfully, but these errors were encountered:

fjarri · 2024-12-02T00:36:13Z

Item 3: arbitrary-sized exponent is available as of crypto-bigint 0.6. This can be addressed now, but some thought needs to be put into trait bounds.

dvdplm · 2025-01-06T16:38:46Z

Did some investigation of point 4; capturing that here.

The reason it's ok to reduce y modulo phi(N) is that x is overwhelmingly likely to be coprime to N (the only two factors of N are p and q, both primes), so we can apply Euler's x^(phi(N)) ≡ 1 mod N.
phi(N) is on the order of N - 2*sqrt(N) (why? N is the product of two numbers each close to sqrt(N) given how we search for our primes, so (p-1)(q-1) is roughly N - (p+q) ≈ N - 2sqrt(N)) which is the same magnitude as N
If y is large, on the order of N^2 or larger, reducing it mod phi(N) takes the exponent to be order N rather than N^2 at the cost of one extra reduction.
reducing y from order N^2 to just N translates into cutting the exponentiation cost by half (why? because log(N^2) ≈ 2*log(N))

~~I find it difficult to guesstimate just how much of a speedup 4) would give us, but it seems likely that it ends up faster.~~

I wrote an artificial benchmark today comparing "vanilla" exponentiation (x^y mod N) of 2048^4096-bit numbers with x^(y mod phi(N)) mod N and perhaps unsurprisingly the latter is 2x as fast. The question about how much of that speedup seeps through in the end? I.e. How many such exponentiations are actually doing in the hot path?

fjarri added the performance Making things faster label Sep 9, 2023

fjarri added this to the v1.0.0 milestone Nov 26, 2023

fjarri mentioned this issue Dec 11, 2024

Signed improvements #166

Merged

dvdplm self-assigned this Jan 15, 2025

dvdplm mentioned this issue Jan 22, 2025

Faster exponentiations #178

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiplication/exponentiation speed-ups #34

Multiplication/exponentiation speed-ups #34

fjarri commented Sep 9, 2023 •

edited

Loading

fjarri commented Dec 2, 2024

dvdplm commented Jan 6, 2025 •

edited

Loading

Multiplication/exponentiation speed-ups #34

Multiplication/exponentiation speed-ups #34

Comments

fjarri commented Sep 9, 2023 • edited Loading

fjarri commented Dec 2, 2024

dvdplm commented Jan 6, 2025 • edited Loading

fjarri commented Sep 9, 2023 •

edited

Loading

dvdplm commented Jan 6, 2025 •

edited

Loading