MONGOCRYPT-483: A 128-bit integer abstraction #510

vector-of-bool · 2022-12-06T00:35:12Z

This changeset adds a new trivial type mlib_int128, which presents a method of performing 128-bit binary arithmetic. This is a prerequesite to MONGOCRYPT-483, which requires 128-bit integers and integer arithmetic.

Some platforms, such as 64-bit GCC, provide an __int128 abstraction, but other platforms do not. MSVC and all 32-bit targets do not have such extensions. For this reason, this PR implements 128-bit arithmetic using only standard C99.

Usage looks like this:

mlib_int128 a typedef of a trivial type that occupies 16 bytes of storage. There is no distinction between signed/unsigned: Operations that depend on sign (for now, just integer comparison) require specifying whether you want a signed compare or an unsigned compare.
MLIB_INT128(N) a macro that creates a 128-bit integer from an integer literal. The literal should not have a suffix. Negation within the macro "just works" (but may generate a false-positive compiler warning).
MLIB_INT128_CAST(N) cast an arbitrary integral value N to a 128-bit integer.
MLIB_INT128_FROM_PARTS(Low64, High64) the actual "constructor" for mlib_in128: Glues together the low bits and the high bits from two 64-bit integers.
MLIB_INT128_SMAX, MLIB_INT128_SMIN, MLIB_INT128_UMAX: The signed-maximum, signed-minimum, and unsigned-maximum values for 128-bit integers.

The following functions all have an mlib_int128_ prefix:

ucmp(L, R) -> int and scmp(L, R) -> int Compare two integers as either unsigned or signed, respectively. Returns n < 0, n > 0, or n = 0 for less, greater, and equal, respectively.
eq(L, R) -> bool - Determine whether L and R are equal.
add(L, R) -> int128 - Return L + R. Overflow wraps.
sub(L, R) -> int128 - Returns L - R. Overflow wraps.
negate(N) -> int128 - Treat N as a signed integer. Return -N.
lshift(N, c) -> int128 - Return N << c
rshift(N, c) -> int128 - Return N >> c
bitor(L, R) -> int128 - Return L | R
mul(L, R) -> int128 - Return L * R. Overflow wraps.
div(L, R) -> int128 - Treats L and R as unsigned: Return L / R (round toward zero).
mod(L, R) -> int128 - Treats L and R as unsigned: Return L % R (remainder after integer division).
divmod(L, R)-> {quotient, remainder} - Obtain both the quotient and remainder of an integer division.
pow10(N) -> int128 - Obtain the Nth power of ten.
pow2(N) -> int128 - Obtain the Nth power of two. (No general pow() is defined.)
from_string(S) -> int128 - Parse S as a string of decimal digits.
to_u64(N) -> uint64_t - Return the low 64 bits of N.
format(N) - Render the N as a string of decimal digits. Returns a struct containing the result array.

The implementations of add, sub, the bitshifts, to/from string, compare, and equality are all fairly straightforward extensions of 64-bit operations (with carry/borrow).

Multiplication here is implemented using Knuth's 4.3.1M multiplication algorithm (from The Art of Computer Programming). Here 4.3.1M is used to multiply the two low 64-bit words to obtain the 128-bit product. The high word of the result is then adjusted by two more regular 64-bit multiplications.

Division is the most complicated operation and took a lot of testing to ensure correct. It uses Knuth's 4.3.1D algorithm for arbitrary precision unsigned division (defined in _mlibKnuth431D). Certain optimizations and shortcuts were learned from the existing STL constexpr implementation. Future optimization to use actual intrinsics and hardware are not done here, since we just need these 128-bit integers and want platform-equivalence.

Most of the code has been marked as constexpr, and a lot of functionality is tested via static_asserts for failing CI as-fast-as-possible and immediate IDE feedback.

The PR tests are implemented in C++ and defined a test macro CHECK, since I miss Catch2 🥲.

src/mlib/int128.h

kkloberdanz · 2022-12-07T15:44:33Z

src/mlib/int128.h

+ * @brief Bitwise-or two 128-bit integers
+ */
+static mlib_constexpr_fn mlib_int128
+mlib_int128_bitor (mlib_int128 l, mlib_int128 r)


I can't seem to find a test for this one. Perhaps we could have a test for it?

kkloberdanz · 2022-12-07T16:04:14Z

src/mlib/int128.h

+ * @brief Multiply two mlib_int128s together. Overflow will wrap.
+ */
+static mlib_constexpr_fn mlib_int128
+mlib_int128_mul (mlib_int128 l, mlib_int128 r)


I did not notice a test for this function. Perhaps we could add a test?

kkloberdanz · 2022-12-07T17:04:02Z

src/mlib/int128.h

+static mlib_constexpr_fn mlib_int128
+_mlibUnsignedMult128 (uint64_t left, uint64_t right)
+{
+   // Perform a Knuth 4.2.1M multiplication


Perhaps a minor typo? In the PDF of TAOCP that I have, I see this algorithm is listed as 4.3.1M.

kkloberdanz · 2022-12-07T17:14:04Z

src/mlib/int128.h

+/// Implementation of Knuth's algorithm 4.3.1 D for unsigned integer division
+static mlib_constexpr_fn void
+_mlibKnuth431D (uint32_t *const u,


I love the explicit citations here! D1, D2, etc

kkloberdanz

Amazing work! Looks good to me. I really appreciate that you cited which algorithms you used. Good job!

kkloberdanz · 2022-12-07T17:24:16Z

As an aside, GNU GMP (https://gmplib.org/) has highly optimized implementations for arbitrary precision integer arithmetic. For simple use cases, I see the appeal of not needing to pull in a new dependency on GMP, but if more complex use cases come up for high precision integer arithmetic, it may be good to evaluate GMP for such a use case.

CMakeLists.txt

src/mlib/macros.h

src/mlib/int128.test.cpp

CMakeLists.txt

src/mlib/int128.h

src/mlib/int128.test.cpp

kevinAlbs

Impressive work.

I suggest more test cases for multiplication and division. Consider using code coverage or fuzzing to ensure branches are exercised. An incorrect result may lead to data corruption.

src/mlib/int128.h

src/mlib/int128.test.cpp

src/mlib/int128.h

src/mlib/int128.test.cpp

- Check can compile as C - Better branch coverage by hitting each bit battern that has an effect on division.

eramongodb

Some minor suggestions and questions remaining; otherwise, LGTM.

eramongodb · 2022-12-16T16:13:39Z

src/mlib/int128.test.cpp

+   CHECK (mlib_int128_mul (MLIB_INT128_CAST (-7), MLIB_INT128_CAST (-7)) ==
+          49_i128);
+
+   // It's useful it specify bit patterns directly


Suggested change

// It's useful it specify bit patterns directly

// It's useful to specify bit patterns directly.

eramongodb · 2022-12-16T16:22:57Z

src/mlib/int128.test.cpp

+      // division checks. It doesn't need to be rigorous or optimal, it only
+      // needs to "just work."
+      std::vector<std::thread> threads;
+      threads.resize (15);


Suggest the following pattern instead:

std::vector<std::thread> threads; for (...) { threads.emplace_back([nbits, dbits, random] () mutable { ... }); } for (auto& t : threads) { t.join(); }

src/mlib/int128.h

eramongodb · 2022-12-16T16:29:39Z

src/mlib/int128.test.cpp

+
+#include <iostream>
+#include <random>
+#include <string>


Suggested change

#include <string>

#include <string>

#include <vector>

eramongodb · 2022-12-16T16:35:31Z

src/mlib/int128.h

+mlib_int128_format (mlib_int128 i)
+{
+   mlib_int128_charbuf into = {0};
+   char *out = into.str + sizeof into - 1;


Suggested change

char *out = into.str + sizeof into - 1;

char *out = into.str + sizeof(into) - 1u;

src/mlib/int128.test.cpp

kevinAlbs

Great work. LGTM

kevinAlbs · 2022-12-18T14:27:13Z

src/mlib/int128.h

+      const u64 n0 = (u32) (numer.r.lo);
+      const u64 n1 = (u32) (numer.r.lo >> 32);
+
+      // We don't need to split n2 and n3. (n3,n2) will be the first parital


Suggested change

// We don't need to split n2 and n3. (n3,n2) will be the first parital

// We don't need to split n2 and n3. (n3,n2) will be the first partial

kevinAlbs · 2022-12-18T14:27:22Z

src/mlib/int128.h

+      }
+   }
+
+   // Denomralization (D8) is done by caller.


Suggested change

// Denomralization (D8) is done by caller.

// Denormalization (D8) is done by caller.

src/mlib/int128.h

kevinAlbs · 2022-12-18T14:50:11Z

src/mlib/int128.h

+                const int vlen,
+                uint32_t *quotient)
+{
+   // Part D1 (normalization) is done by caller, normalized in u and v (b is 32)


Suggested change

// Part D1 (normalization) is done by caller, normalized in u and v (b is 32)

// Part D1 (normalization) is done by caller, normalized in u and v (b is 2^32)

b is the radix. The digits are 32 bit unsigned integers.

vector-of-bool added 4 commits December 2, 2022 20:33

Compat macros

5ced496

128-bit integers

289e9b4

Cleanup, C++ version fixes, MSVC fixes, constexpr compat

0add18f

Remove debug print

d8032ef

addaleax reviewed Dec 6, 2022

View reviewed changes

src/mlib/int128.h Outdated Show resolved Hide resolved

Spell

36108c1

vector-of-bool marked this pull request as ready for review December 6, 2022 19:19

vector-of-bool requested review from eramongodb and kevinAlbs December 6, 2022 19:19

kevinAlbs requested a review from kkloberdanz December 6, 2022 19:22

kkloberdanz reviewed Dec 7, 2022

View reviewed changes

kkloberdanz approved these changes Dec 7, 2022

View reviewed changes

eramongodb reviewed Dec 8, 2022

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

src/mlib/macros.h Outdated Show resolved Hide resolved

src/mlib/int128.test.cpp Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

src/mlib/int128.h Show resolved Hide resolved

eramongodb reviewed Dec 9, 2022

View reviewed changes

src/mlib/int128.test.cpp Outdated Show resolved Hide resolved

eramongodb reviewed Dec 9, 2022

View reviewed changes

src/mlib/int128.test.cpp Outdated Show resolved Hide resolved

kevinAlbs reviewed Dec 12, 2022

View reviewed changes

vector-of-bool added 9 commits December 12, 2022 21:16

PR tweaks and more tests

50b64e3

- Check can compile as C - Better branch coverage by hitting each bit battern that has an effect on division.

Support a base prefix in from_string

d942b99

Fix u128/u64 remainder

4aea827

Comment tweak

e433e29

Better explain the 128/32 division

8bcb3c6

Additional mult tests

20918e2

Spelling and comments

683d5ec

VS2017 compat

985549e

Fix unary-minus warning

af83687

vector-of-bool requested a review from eramongodb December 15, 2022 23:55

vector-of-bool requested a review from kevinAlbs December 15, 2022 23:55

eramongodb approved these changes Dec 16, 2022

View reviewed changes

kevinAlbs approved these changes Dec 18, 2022

View reviewed changes

PR comments, spelling, etc.

92ddb11

vector-of-bool merged commit c8d5ffe into mongodb:master Dec 19, 2022

vector-of-bool mentioned this pull request Dec 20, 2022

[MONGOCRYPT-483]: Support Decimal128 in range-based Queryable Encryption #522

Merged

	// It's useful it specify bit patterns directly
	// It's useful to specify bit patterns directly.

	char *out = into.str + sizeof into - 1;
	char *out = into.str + sizeof(into) - 1u;

	// We don't need to split n2 and n3. (n3,n2) will be the first parital
	// We don't need to split n2 and n3. (n3,n2) will be the first partial

	// Denomralization (D8) is done by caller.
	// Denormalization (D8) is done by caller.

	// Part D1 (normalization) is done by caller, normalized in u and v (b is 32)
	// Part D1 (normalization) is done by caller, normalized in u and v (b is 2^32)

MONGOCRYPT-483: A 128-bit integer abstraction #510

MONGOCRYPT-483: A 128-bit integer abstraction #510

Uh oh!

Conversation

vector-of-bool commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkloberdanz left a comment

Choose a reason for hiding this comment

Uh oh!

kkloberdanz commented Dec 7, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinAlbs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eramongodb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevinAlbs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vector-of-bool commented Dec 6, 2022 •

edited

Loading