Skip to content

Conversation

@tarcieri
Copy link
Member

@tarcieri tarcieri commented Aug 4, 2024

The previous implementation runs in variable-time with respect to g. However in the event both inputs are secret a fully constant-time implementation is required.

This implements the method described in section 11 of https://eprint.iacr.org/2019/266.pdf and more specifically this Python code from Figure 11.1:

from divsteps2 import divsteps2

def iterations(d):
      return (49*d+80)//17 if d<46 else (49*d+57)//17

def gcd2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      delta, fm, gm, P = divsteps2 (m,m+d,1,f,g)
      return abs(fm)

def recip2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      precomp = Integers(f)((f+1)/2)^(m1)
      delta, fm, gm, P = divsteps2(m,m+1,1,f,g)
      V = sign(fm)*ZZ(P[0][1]*2^(m-1))
      return ZZ(V*precomp)

Instead of bounding the loop on g reaching zero, this instead computes a fixed number of iterations relative to the highest bit of either f or g after which the algorithm will converge, then runs for that number of iterations instead.

This results in about a 22X performance impact:

Bernstein-Yang invert, U256
        time:   [36.934 µs 37.151 µs 37.391 µs]
        change: [+2256.7% +2267.7% +2278.4%] (p = 0.00 < 0.05)
        Performance has regressed.

The previous implementation which is variable-time with respect to g is preserved as well, for now as gcd_vartime, but it would also be nice to add an inv_mod_vartime as well.

@tarcieri tarcieri requested a review from fjarri August 4, 2024 17:12
@tarcieri
Copy link
Member Author

tarcieri commented Aug 4, 2024

Aside from a similar change to the Boxed* implementation, I think this is the last bit of timing variability needed to address #627

@tarcieri tarcieri force-pushed the constant-time-bernstein-yang branch from fd95714 to e277688 Compare August 4, 2024 17:18
@tarcieri tarcieri mentioned this pull request Aug 4, 2024
3 tasks
The previous implementation runs in variable-time with respect to `g`.
However in the event both inputs are secret a fully constant-time
implementation is required.

This implements the method described in section 11 of
https://eprint.iacr.org/2019/266.pdf and more specifically this Python
code from Figure 11.1:

    from divsteps2 import divsteps2

    def iterations(d):
      return (49*d+80)//17 if d<46 else (49*d+57)//17

    def gcd2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      delta, fm, gm, P = divsteps2 (m,m+d,1,f,g)
      return abs(fm)

    def recip2(f,g):
      assert f & 1
      d = max(f.nbits(),g.nbits())
      m = iterations(d)
      precomp = Integers(f)((f+1)/2)^(m−1)
      delta, fm, gm, P = divsteps2(m,m+1,1,f,g)
      V = sign(fm)*ZZ(P[0][1]*2^(m-1))
      return ZZ(V*precomp)

Instead of bounding the loop on `g` reaching zero, this instead computes
a fixed number of iterations relative to the highest bit of either `f`
or `g` after which the algorithm will converge, then runs for that
number of iterations instead.

This results in about a 22X performance impact:

    Bernstein-Yang invert, U256
            time:   [36.934 µs 37.151 µs 37.391 µs]
            change: [+2256.7% +2267.7% +2278.4%] (p = 0.00 < 0.05)
            Performance has regressed.

The previous implementation which is variable-time with respect to `g`
is preserved as well, for now as `gcd_vartime`, but it would also be
nice to add an `inv_mod_vartime` as well.
@tarcieri tarcieri force-pushed the constant-time-bernstein-yang branch from e277688 to 51cd353 Compare August 4, 2024 17:21
@tarcieri
Copy link
Member Author

tarcieri commented Aug 4, 2024

Note: there's an alternative strategy for computing the upper bounds here which we should investigate https://github.com/sipa/safegcd-bounds

@fjarri
Copy link
Contributor

fjarri commented Aug 4, 2024

I wonder if an implementation of gcd_vartime that just uses rem_vartime would be faster or not.

@tarcieri
Copy link
Member Author

tarcieri commented Aug 4, 2024

I should probably add benchmarks for gcd_vartime as well

@tarcieri
Copy link
Member Author

tarcieri commented Aug 4, 2024

Added some benchmarks in f2a5aed.

For reference:

greatest common divisor/gcd, U256
                        time:   [40.901 µs 41.114 µs 41.391 µs]

greatest common divisor/gcd_vartime, U256
                        time:   [1.5651 µs 1.5714 µs 1.5796 µs]

wrapping ops/div/rem_vartime, U256/U128, full size
                        time:   [87.533 ns 87.881 ns 88.262 ns]

wrapping ops/rem_vartime, U256/U128, full size
                        time:   [87.768 ns 88.654 ns 90.059 ns]

@fjarri I guess the idea would be to use rem_vartime to implement Euclid's method? But what's interesting about a gcd_vartime or inv_mod_vartime using Bernstein-Yang is it's constant time with respect to f (but not to g). I'm not sure you can implement Euclid's algorithm in such a manner.

@tarcieri tarcieri merged commit cdc8487 into master Aug 4, 2024
@tarcieri tarcieri deleted the constant-time-bernstein-yang branch August 4, 2024 19:53
@fjarri
Copy link
Contributor

fjarri commented Aug 4, 2024

That's true, but an implementation vartime in both arguments is useful too, so I wonder if removing the restriction on the first argument leads to a noticeable performance gain.

@tarcieri
Copy link
Member Author

tarcieri commented Aug 4, 2024

@fjarri I guess we could potentially have all three options, but I'm not sure about naming.

One question I'd have is what is the use case for a fully variable time GCD in cryptographic algorithms.

@fjarri
Copy link
Contributor

fjarri commented Aug 4, 2024

I tried out a Euclidean algorithm implementation with rem_vartime(), and it seems to be significantly slower than the current gcd_vartime() (about 10x).

tarcieri added a commit that referenced this pull request Aug 4, 2024
The previous implementation runs in variable-time with respect to `g`.
However in the event both inputs are secret a fully constant-time
implementation is required.

This implements Bernstein-Yang in constant-time with respect to both
parameters by computing a worst case number of iterations for the
algorithm to converge, partially sharing the implementation with #632.
tarcieri added a commit that referenced this pull request Aug 4, 2024
The previous implementation runs in variable-time with respect to `g`.
However in the event both inputs are secret a fully constant-time
implementation is required.

This implements Bernstein-Yang in constant-time with respect to both
parameters by computing a worst case number of iterations for the
algorithm to converge, partially sharing the implementation with #632.
tarcieri added a commit that referenced this pull request Aug 5, 2024
The previous implementation runs in variable-time with respect to `g`.
However in the event both inputs are secret a fully constant-time
implementation is required.

This implements Bernstein-Yang in constant-time with respect to both
parameters by computing a worst case number of iterations for the
algorithm to converge, partially sharing the implementation with #632.
@tarcieri tarcieri mentioned this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants