-
Notifications
You must be signed in to change notification settings - Fork 99
MONGOCRYPT-483: A 128-bit integer abstraction #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * @brief Bitwise-or two 128-bit integers | ||
| */ | ||
| static mlib_constexpr_fn mlib_int128 | ||
| mlib_int128_bitor (mlib_int128 l, mlib_int128 r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't seem to find a test for this one. Perhaps we could have a test for it?
| * @brief Multiply two mlib_int128s together. Overflow will wrap. | ||
| */ | ||
| static mlib_constexpr_fn mlib_int128 | ||
| mlib_int128_mul (mlib_int128 l, mlib_int128 r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice a test for this function. Perhaps we could add a test?
src/mlib/int128.h
Outdated
| static mlib_constexpr_fn mlib_int128 | ||
| _mlibUnsignedMult128 (uint64_t left, uint64_t right) | ||
| { | ||
| // Perform a Knuth 4.2.1M multiplication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a minor typo? In the PDF of TAOCP that I have, I see this algorithm is listed as 4.3.1M.
| /// Implementation of Knuth's algorithm 4.3.1 D for unsigned integer division | ||
| static mlib_constexpr_fn void | ||
| _mlibKnuth431D (uint32_t *const u, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the explicit citations here! D1, D2, etc
kkloberdanz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Looks good to me. I really appreciate that you cited which algorithms you used. Good job!
|
As an aside, GNU GMP (https://gmplib.org/) has highly optimized implementations for arbitrary precision integer arithmetic. For simple use cases, I see the appeal of not needing to pull in a new dependency on GMP, but if more complex use cases come up for high precision integer arithmetic, it may be good to evaluate GMP for such a use case. |
kevinAlbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work.
I suggest more test cases for multiplication and division. Consider using code coverage or fuzzing to ensure branches are exercised. An incorrect result may lead to data corruption.
- Check can compile as C - Better branch coverage by hitting each bit battern that has an effect on division.
eramongodb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions and questions remaining; otherwise, LGTM.
src/mlib/int128.test.cpp
Outdated
| CHECK (mlib_int128_mul (MLIB_INT128_CAST (-7), MLIB_INT128_CAST (-7)) == | ||
| 49_i128); | ||
|
|
||
| // It's useful it specify bit patterns directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // It's useful it specify bit patterns directly | |
| // It's useful to specify bit patterns directly. |
src/mlib/int128.test.cpp
Outdated
| // division checks. It doesn't need to be rigorous or optimal, it only | ||
| // needs to "just work." | ||
| std::vector<std::thread> threads; | ||
| threads.resize (15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest the following pattern instead:
std::vector<std::thread> threads;
for (...) {
threads.emplace_back([nbits, dbits, random] () mutable { ... });
}
for (auto& t : threads) {
t.join();
}|
|
||
| #include <iostream> | ||
| #include <random> | ||
| #include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #include <string> | |
| #include <string> | |
| #include <vector> |
src/mlib/int128.h
Outdated
| mlib_int128_format (mlib_int128 i) | ||
| { | ||
| mlib_int128_charbuf into = {0}; | ||
| char *out = into.str + sizeof into - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| char *out = into.str + sizeof into - 1; | |
| char *out = into.str + sizeof(into) - 1u; |
kevinAlbs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. LGTM
src/mlib/int128.h
Outdated
| const u64 n0 = (u32) (numer.r.lo); | ||
| const u64 n1 = (u32) (numer.r.lo >> 32); | ||
|
|
||
| // We don't need to split n2 and n3. (n3,n2) will be the first parital |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // We don't need to split n2 and n3. (n3,n2) will be the first parital | |
| // We don't need to split n2 and n3. (n3,n2) will be the first partial |
src/mlib/int128.h
Outdated
| } | ||
| } | ||
|
|
||
| // Denomralization (D8) is done by caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Denomralization (D8) is done by caller. | |
| // Denormalization (D8) is done by caller. |
src/mlib/int128.h
Outdated
| const int vlen, | ||
| uint32_t *quotient) | ||
| { | ||
| // Part D1 (normalization) is done by caller, normalized in u and v (b is 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Part D1 (normalization) is done by caller, normalized in u and v (b is 32) | |
| // Part D1 (normalization) is done by caller, normalized in u and v (b is 2^32) |
b is the radix. The digits are 32 bit unsigned integers.
This changeset adds a new trivial type
mlib_int128, which presents a method of performing 128-bit binary arithmetic. This is a prerequesite to MONGOCRYPT-483, which requires 128-bit integers and integer arithmetic.Some platforms, such as 64-bit GCC, provide an
__int128abstraction, but other platforms do not. MSVC and all 32-bit targets do not have such extensions. For this reason, this PR implements 128-bit arithmetic using only standard C99.Usage looks like this:
mlib_int128a typedef of a trivial type that occupies 16 bytes of storage. There is no distinction between signed/unsigned: Operations that depend on sign (for now, just integer comparison) require specifying whether you want a signed compare or an unsigned compare.MLIB_INT128(N)a macro that creates a 128-bit integer from an integer literal. The literal should not have a suffix. Negation within the macro "just works" (but may generate a false-positive compiler warning).MLIB_INT128_CAST(N)cast an arbitrary integral valueNto a 128-bit integer.MLIB_INT128_FROM_PARTS(Low64, High64)the actual "constructor" formlib_in128: Glues together the low bits and the high bits from two 64-bit integers.MLIB_INT128_SMAX,MLIB_INT128_SMIN,MLIB_INT128_UMAX: The signed-maximum, signed-minimum, and unsigned-maximum values for 128-bit integers.The following functions all have an
mlib_int128_prefix:ucmp(L, R) -> intandscmp(L, R) -> intCompare two integers as either unsigned or signed, respectively. Returnsn < 0,n > 0, orn = 0for less, greater, and equal, respectively.eq(L, R) -> bool- Determine whetherLandRare equal.add(L, R) -> int128- ReturnL + R. Overflow wraps.sub(L, R) -> int128- ReturnsL - R. Overflow wraps.negate(N) -> int128- TreatNas a signed integer. Return-N.lshift(N, c) -> int128- ReturnN << crshift(N, c) -> int128- ReturnN >> cbitor(L, R) -> int128- ReturnL | Rmul(L, R) -> int128- ReturnL * R. Overflow wraps.div(L, R) -> int128- TreatsLandRas unsigned: ReturnL / R(round toward zero).mod(L, R) -> int128- TreatsLandRas unsigned: ReturnL % R(remainder after integer division).divmod(L, R)-> {quotient, remainder}- Obtain both the quotient and remainder of an integer division.pow10(N) -> int128- Obtain theNth power of ten.pow2(N) -> int128- Obtain theNth power of two. (No generalpow()is defined.)from_string(S) -> int128- ParseSas a string of decimal digits.to_u64(N) -> uint64_t- Return the low 64 bits ofN.format(N)- Render theNas a string of decimal digits. Returns a struct containing the result array.The implementations of
add,sub, the bitshifts, to/from string, compare, and equality are all fairly straightforward extensions of 64-bit operations (with carry/borrow).Multiplication here is implemented using Knuth's 4.3.1M multiplication algorithm (from The Art of Computer Programming). Here 4.3.1M is used to multiply the two low 64-bit words to obtain the 128-bit product. The high word of the result is then adjusted by two more regular 64-bit multiplications.
Division is the most complicated operation and took a lot of testing to ensure correct. It uses Knuth's 4.3.1D algorithm for arbitrary precision unsigned division (defined in
_mlibKnuth431D). Certain optimizations and shortcuts were learned from the existing STLconstexprimplementation. Future optimization to use actual intrinsics and hardware are not done here, since we just need these 128-bit integers and want platform-equivalence.Most of the code has been marked as
constexpr, and a lot of functionality is tested viastatic_assertsfor failing CI as-fast-as-possible and immediate IDE feedback.The PR tests are implemented in C++ and defined a test macro
CHECK, since I miss Catch2 🥲.