P-521 implementation with Fiat-crypto field arithmetic #376

dkostic · 2022-01-26T19:14:54Z

Issues:

CryptoAlg-923

Description of changes:

This PR adds the implementation of P-521 curve that uses Fiat-crypto field arithmetic. The implementation is done in the same way as for P-384. The main changes are:

Imported Fiat-crypto field arithmetic implementation in third_party directory (files p521_32.h and p521_64.h)
Added full P-521 implementation in file p521.c. The file contains implementation of:
- Point arithmetic (addition and doubling)
- Inversion of a scalar
- Curve scalar multiplication functions (multiplication of an arbitrary point by a given scalar, multiplication of the base point by a given scalar, sum of the product of a given scalar with a given arbitrary point and a given scalar and the base point).
Updated make_tables.go script to support generating the pre-computed table for P-521 (the generated file is p521_tables.h).
Updated certain tests to reflect the fact that P-521 is not using Montgomery arithmetic anymore.
Slight refactoring of P-384 and some other files (mostly adjusting line widths to 80 characters).

Finally, the performance improvements of this change are:

ECDH faster by a factor of 3.7 and 3.9 on GV2 and x86, respectively.
ECDSA sign faster by a factor of 5.7 and 5.9 on GV2 and x86, respectively.
ECDSA verify by a factor of 2.3 and 2.4 on GV2 and x86, respectively.
The raw numbers measure by ./tool/bssl speed -filter "P-521"

(x86) EC2 instance with Intel(R) Xeon(R) Platinum 8124M CPU:
Before:
Did 451 ECDH P-521 operations in 1099027us (410.4 ops/sec)
Did 792 ECDSA P-521 signing operations in 1083113us (731.2 ops/sec)
Did 781 ECDSA P-521 verify operations in 1077091us (725.1 ops/sec)

After:
Did 1716 ECDH P-521 operations in 1065345us (1610.7 ops/sec)
Did 4400 ECDSA P-521 signing operations in 1023071us (4300.8 ops/sec)
Did 1903 ECDSA P-521 verify operations in 1076533us (1767.7 ops/sec)

(Arm) EC2 instance with Graviton 2:
Before:
Did 154 ECDH P-521 operations in 1071138us (143.8 ops/sec)
Did 275 ECDSA P-521 signing operations in 1078510us (255.0 ops/sec)
Did 264 ECDSA P-521 verify operations in 1073877us (245.8 ops/sec)

After:
Did 572 ECDH P-521 operations in 1075348us (531.9 ops/sec)
Did 1540 ECDSA P-521 signing operations in 1054963us (1459.8 ops/sec)
Did 627 ECDSA P-521 verify operations in 1090150us (575.2 ops/sec)

Call-outs:

Point out areas that need special attention or support during the review process. Discuss architecture or design changes.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Dusan Kostic <[email protected]>

nebeid

I'm in the middle of make_tables.go. Will continue reviewing tomorrow.

crypto/ecdh_extra/ecdh_test.cc

crypto/fipsmodule/ec/ec_test.cc

nebeid · 2022-02-01T22:44:10Z

crypto/fipsmodule/ec/make_tables.go

+
+	if err := writeP521Table("p521_table.h"); err != nil {
+		fmt.Fprintf(os.Stderr, "Error writing p521_table.h: %s\n", err)
+		os.Exit(1)
+	}


I think there is different indentations in these lines than the ones above.

I think the indentation is fixed in the whole file now.

crypto/fipsmodule/ec/make_tables.go

nebeid · 2022-02-01T23:19:04Z

crypto/fipsmodule/ec/make_tables.go

+	ret := make([]uint64, words)
+	mask := big.NewInt((1 << 58) - 1)
+	tmp := new(big.Int).Set(n)
+	for i := 0; i < words; i++ {


Should the stopping condition be tmp == 0 as an additional check on the correct calculation of words? Alternatively, we can check after the loop that tmp =0.

I think there is no reason for additional checks here, there are no checks in the existing functions and the script is used only for generating the table. Any mistake here and the table would be wrong, which means the unit tests for the curve would fail.

nebeid · 2022-02-01T23:31:05Z

crypto/fipsmodule/ec/make_tables.go

+    mask := big.NewInt((1 << bits) - 1)
+		ret[i] = new(big.Int).And(tmp, mask).Uint64()
+		tmp.Rsh(tmp, bits)
+	}


Similar comment, not sure if there is a value in checking tmp=0 here.
Also the indentation seems off in the function.

see my comment above. Indentation fixed.

nebeid

Still continuing

nebeid · 2022-02-02T14:12:35Z

crypto/fipsmodule/ec/make_tables.go

@@ -414,9 +514,40 @@ func writeU32Mont(w io.Writer, curve elliptic.Curve, n *big.Int, indent int) err
 	})
 }

-type writeBigIntFunc func(w io.Writer, curve elliptic.Curve, n *big.Int, indent int) error
+func writeU64(w io.Writer, curve elliptic.Curve, n *big.Int, indent int, bitSizes []uint) error {


I think bitSizes is not used in this function nor the following one, nor in writeU32. Is there a reason to have it in the arguments?

don't get me started on Go's limitations :) Because of the way all these functions are defined and used I couldn't find a more elegant way than adding the parameter to all the functions even if it's not used everywhere

crypto/fipsmodule/ec/make_tables.go

nebeid · 2022-02-02T15:12:03Z

crypto/fipsmodule/ec/p384.c

      // For each digit |d| in the current group read the corresponding point
      // from the table and add it to |res|. If |d| is negative, negate
      // the point before adding it to |res|.
      int16_t d = rnaf[j];
      // is_neg = (d < 0) ? 1 : 0
-      int16_t is_neg = (d >> 7) & 1;
+      int16_t is_neg = (d >> 15) & 1;


Good catch! Thank you. There is another occurrence at line 772.
Just curious, was it causing errors in P-521?

No, it could cause errors only with window size >7 I think. Fixed 772.

nebeid · 2022-02-02T17:01:17Z

crypto/fipsmodule/ec/p521.c

+  fiat_secp521r1_selectznz(out, !!t, z, nz);
+}
+
+static void p521_from_generic(p521_felem out, const EC_FELEM *in) {


I keep coming back to remember this: It may be worth it to add as a comment here (and in p384.c and p256.c) and before _to_generic that the input and output are in little-endian representation.

nebeid · 2022-02-02T22:55:30Z

crypto/fipsmodule/ec/p521.c

+  out[P521_MUL_NWINDOWS - 1] = window;
+}
+
+// fiat_p521_select_point selects the |idx|-th projective point from the given


This function and the following one kept the fiat_ prefix which was removed in p384 ones.

thanks! changed

crypto/fipsmodule/ec/p384.c

nebeid · 2022-02-02T23:12:59Z

crypto/fipsmodule/ec/p521.c

+  p521_to_generic(&r->X, res[0]);
+  p521_to_generic(&r->Y, res[1]);
+  p521_to_generic(&r->Z, res[2]);


Are these lines needed? I don't think r is used afterwards until they're repeated again at l. 675.

good catch, leftover from debugging.

nebeid · 2022-02-03T18:07:01Z

crypto/fipsmodule/ec/p521.c

+    p521_felem acc, t128, t16, t2, t256, t32, t4;
+    p521_felem t512, t516, t518, t519, t64, t8;


Nit: we can rearrange the variable names in their order of appearance, which I think is their ascending order, unless you'd like to keep it true to where it's copied from.

…s#376) * Workaround to mark the DEFAULT SecureRandom algorithm thread-safe This is a workaround to ensure that the `nextBytes` calls to SecureRandom are not synchronized when specifying the alias algorithm `DEFAULT` of ACCP SecureRandom provider. Without this workaround, the `nextBytes` calls are synchronized and the multi-threaded performance of ACCP SecureRandom drops: concretely, with x threads concurrently generating randomness, average time per thread gets multiplied by x. In more details, OpenJDK/Corretto requires thread safe implementations of SecureRandom to specify the attribute ThreadSafe (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/security/SecureRandom.html). If this attribute is not set to true, the nextBytes calls are synchronized. ACCP sets this attribute when registering SecureRandom. It also registers an alias `DEFAULT` to the algorithm `LibCryptoRng` that it registers. Attributes of services are converted to attributes at the provider level, using the naming convention: `Service.Algorithm Attribute`. Thus, setting the `ThreadSafe` attribute leads to setting `SecureRandom.LibCryptoRng ThreadSafe=true`, but not `SecureRandom.DEFAULT ThreadSafe=true`. Unfortunately, the JDK does not follow aliases and thus the DEFAULT algorithm is not considered thread safe. See, e.g., https://github.com/corretto/corretto-17/blob/e0d038c50d5c61683a9270218b32c5aed2b0b466/src/java.base/share/classes/java/security/SecureRandom.java#L229-L236 FAQ: > Who is affected by the performance drop if this workaround is not applied? Anyone instantiating explicitly the `DEFAULT` algorithm of ACCP SecureRandom (i.e., `SecureRandom.getInstance("DEFAULT", "AmazonCorrettoCryptoProvider")`) and using it concurrently in multiple threads. This includes the benchmarks in the subfolder `benchmarks/` of this repo. > I am using `new SecureRandom`, am I affected? No. `new SecureRandom()` uses the algorithm `LibCryptoRng` and not the alias `DEFAULT` (assuming ACCP is set as first cryptographic provider). > I did not register ACCP SecureRandom, am I affected? No. > Can this workaround impact other crypto providers like BouncyCastle (which also provides a `DEFAULT` algorithm for SecureRandom)? No, attributes are stored at the provider level. This change will not affect other providers. * Add unit tests and checks when setting DEFAULT to be ThreadSafe This commit improves the way SecureRandom algorithms (LibCryptoRng and its alias DEFAULT) are registered and adds unit tests. 1. Check that DEFAULT points to LibCryptoRng before setting it to be ThreadSafe. This is to mitigate risks that a future code change uses a different non-ThreadSafe algorithm for the DEFAULT alias. 2. Add unit tests checking that both LibCryptoRng and DEFAULT are ThreadSafe. --------- Co-authored-by: Fabrice Benhamouda <[email protected]>

Following reversal of PR aws#376 in PR aws#388, using the ACCP SecureRandom algorithm `DEFAULT` (which is an alias of `LibCryptoRng`) yields lower performance in multi-threaded settings. See aws#376 for details. This PR is meant to use the algorithm `LibCryptoRng` in the benchmark, instead of its alias `DEFAULT`. This solves the performance drop in multi-threaded settings. This also corresponds to the most common use of ACCP SecureRandom. Indeed, if ACCP is the first security provider and if ACCP SecureRandom is registered, then when instantiating SecureRandom as `new SecureRandom()`, the algorithm `LibCryptoRng` will be selected. Co-authored-by: Fabrice Benhamouda <[email protected]>

Dusan Kostic added 3 commits January 26, 2022 19:50

P-521 implementation with Fiat-crypto field arithmetic

50afa4a

Signed-off-by: Dusan Kostic <[email protected]>

Small fix for to_generic function

7f48d67

Remove code used for debugging

14b1f54

apetcher-amazon mentioned this pull request Jan 28, 2022

Tracking uses of fiat-crypto mit-plv/fiat-crypto#902

Open

DominicDams previously approved these changes Feb 1, 2022

View reviewed changes

nebeid reviewed Feb 1, 2022

View reviewed changes

nebeid reviewed Feb 2, 2022

View reviewed changes

nebeid reviewed Feb 3, 2022

View reviewed changes

Addressing CR 1

cf96622

dkostic dismissed DominicDams’s stale review via cf96622 February 4, 2022 17:35

nebeid approved these changes Feb 4, 2022

View reviewed changes

Merge branch 'main' into p521-fiat-crypto

0a3a8c5

torben-hansen approved these changes Feb 7, 2022

View reviewed changes

dkostic merged commit afdddac into aws:main Feb 7, 2022

dkostic deleted the p521-fiat-crypto branch February 7, 2022 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P-521 implementation with Fiat-crypto field arithmetic #376

P-521 implementation with Fiat-crypto field arithmetic #376

dkostic commented Jan 26, 2022

nebeid left a comment

nebeid Feb 1, 2022

dkostic Feb 4, 2022

nebeid Feb 1, 2022

dkostic Feb 4, 2022

nebeid Feb 1, 2022

dkostic Feb 4, 2022

nebeid left a comment

nebeid Feb 2, 2022

dkostic Feb 4, 2022

nebeid Feb 2, 2022

dkostic Feb 4, 2022

nebeid Feb 2, 2022

dkostic Feb 4, 2022

nebeid Feb 2, 2022

dkostic Feb 4, 2022

nebeid Feb 2, 2022

dkostic Feb 4, 2022

nebeid Feb 3, 2022

dkostic Feb 4, 2022

		p521_felem acc, t128, t16, t2, t256, t32, t4;
		p521_felem t512, t516, t518, t519, t64, t8;

P-521 implementation with Fiat-crypto field arithmetic #376

P-521 implementation with Fiat-crypto field arithmetic #376

Conversation

dkostic commented Jan 26, 2022

Issues:

Description of changes:

Call-outs:

Testing:

nebeid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nebeid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment