Initial constant-time stack-allocated Bernstein-Yang #632

tarcieri · 2024-08-04T17:12:37Z

The previous implementation runs in variable-time with respect to g. However in the event both inputs are secret a fully constant-time implementation is required.

This implements the method described in section 11 of https://eprint.iacr.org/2019/266.pdf and more specifically this Python code from Figure 11.1:

from divsteps2 import divsteps2

def iterations(d):
      return (49*d+80)//17 if d<46 else (49*d+57)//17

def gcd2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      delta, fm, gm, P = divsteps2 (m,m+d,1,f,g)
      return abs(fm)

def recip2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      precomp = Integers(f)((f+1)/2)^(m−1)
      delta, fm, gm, P = divsteps2(m,m+1,1,f,g)
      V = sign(fm)*ZZ(P[0][1]*2^(m-1))
      return ZZ(V*precomp)

Instead of bounding the loop on g reaching zero, this instead computes a fixed number of iterations relative to the highest bit of either f or g after which the algorithm will converge, then runs for that number of iterations instead.

This results in about a 22X performance impact:

Bernstein-Yang invert, U256
        time:   [36.934 µs 37.151 µs 37.391 µs]
        change: [+2256.7% +2267.7% +2278.4%] (p = 0.00 < 0.05)
        Performance has regressed.

The previous implementation which is variable-time with respect to g is preserved as well, for now as gcd_vartime, but it would also be nice to add an inv_mod_vartime as well.

tarcieri · 2024-08-04T17:17:50Z

Aside from a similar change to the Boxed* implementation, I think this is the last bit of timing variability needed to address #627

The previous implementation runs in variable-time with respect to `g`. However in the event both inputs are secret a fully constant-time implementation is required. This implements the method described in section 11 of https://eprint.iacr.org/2019/266.pdf and more specifically this Python code from Figure 11.1: from divsteps2 import divsteps2 def iterations(d): return (49*d+80)//17 if d<46 else (49*d+57)//17 def gcd2(f,g): assert f & 1 d = max(f.nbits(),g.nbits()) m = iterations(d) delta, fm, gm, P = divsteps2 (m,m+d,1,f,g) return abs(fm) def recip2(f,g): assert f & 1 d = max(f.nbits(),g.nbits()) m = iterations(d) precomp = Integers(f)((f+1)/2)^(m−1) delta, fm, gm, P = divsteps2(m,m+1,1,f,g) V = sign(fm)*ZZ(P[0][1]*2^(m-1)) return ZZ(V*precomp) Instead of bounding the loop on `g` reaching zero, this instead computes a fixed number of iterations relative to the highest bit of either `f` or `g` after which the algorithm will converge, then runs for that number of iterations instead. This results in about a 22X performance impact: Bernstein-Yang invert, U256 time: [36.934 µs 37.151 µs 37.391 µs] change: [+2256.7% +2267.7% +2278.4%] (p = 0.00 < 0.05) Performance has regressed. The previous implementation which is variable-time with respect to `g` is preserved as well, for now as `gcd_vartime`, but it would also be nice to add an `inv_mod_vartime` as well.

tarcieri · 2024-08-04T17:22:51Z

Note: there's an alternative strategy for computing the upper bounds here which we should investigate https://github.com/sipa/safegcd-bounds

fjarri · 2024-08-04T18:10:50Z

I wonder if an implementation of gcd_vartime that just uses rem_vartime would be faster or not.

tarcieri · 2024-08-04T18:12:56Z

I should probably add benchmarks for gcd_vartime as well

tarcieri · 2024-08-04T19:47:45Z

Added some benchmarks in f2a5aed.

For reference:

greatest common divisor/gcd, U256
                        time:   [40.901 µs 41.114 µs 41.391 µs]

greatest common divisor/gcd_vartime, U256
                        time:   [1.5651 µs 1.5714 µs 1.5796 µs]

wrapping ops/div/rem_vartime, U256/U128, full size
                        time:   [87.533 ns 87.881 ns 88.262 ns]

wrapping ops/rem_vartime, U256/U128, full size
                        time:   [87.768 ns 88.654 ns 90.059 ns]

@fjarri I guess the idea would be to use rem_vartime to implement Euclid's method? But what's interesting about a gcd_vartime or inv_mod_vartime using Bernstein-Yang is it's constant time with respect to f (but not to g). I'm not sure you can implement Euclid's algorithm in such a manner.

fjarri · 2024-08-04T19:55:01Z

That's true, but an implementation vartime in both arguments is useful too, so I wonder if removing the restriction on the first argument leads to a noticeable performance gain.

tarcieri · 2024-08-04T19:56:12Z

@fjarri I guess we could potentially have all three options, but I'm not sure about naming.

One question I'd have is what is the use case for a fully variable time GCD in cryptographic algorithms.

fjarri · 2024-08-04T21:52:01Z

I tried out a Euclidean algorithm implementation with rem_vartime(), and it seems to be significantly slower than the current gcd_vartime() (about 10x).

The previous implementation runs in variable-time with respect to `g`. However in the event both inputs are secret a fully constant-time implementation is required. This implements Bernstein-Yang in constant-time with respect to both parameters by computing a worst case number of iterations for the algorithm to converge, partially sharing the implementation with #632.

tarcieri requested a review from fjarri August 4, 2024 17:12

tarcieri force-pushed the constant-time-bernstein-yang branch from fd95714 to e277688 Compare August 4, 2024 17:18

tarcieri mentioned this pull request Aug 4, 2024

Bernstein-Yang: constant-time issues #627

Closed

3 tasks

tarcieri force-pushed the constant-time-bernstein-yang branch from e277688 to 51cd353 Compare August 4, 2024 17:21

Add GCD benchmarks

f2a5aed

tarcieri merged commit cdc8487 into master Aug 4, 2024

tarcieri deleted the constant-time-bernstein-yang branch August 4, 2024 19:53

tarcieri mentioned this pull request Aug 4, 2024

Constant-time heap-allocated Bernstein-Yang #635

Merged

tarcieri mentioned this pull request Jan 22, 2025

v0.6.0 #750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial constant-time stack-allocated Bernstein-Yang #632

Initial constant-time stack-allocated Bernstein-Yang #632

Uh oh!

tarcieri commented Aug 4, 2024 •

edited

Loading

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024 •

edited

Loading

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Initial constant-time stack-allocated Bernstein-Yang #632

Initial constant-time stack-allocated Bernstein-Yang #632

Uh oh!

Conversation

tarcieri commented Aug 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

tarcieri commented Aug 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjarri commented Aug 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tarcieri commented Aug 4, 2024 •

edited

Loading

tarcieri commented Aug 4, 2024 •

edited

Loading