Skip to content

Conversation

@vector-of-bool
Copy link
Contributor

@vector-of-bool vector-of-bool commented Dec 6, 2022

This changeset adds a new trivial type mlib_int128, which presents a method of performing 128-bit binary arithmetic. This is a prerequesite to MONGOCRYPT-483, which requires 128-bit integers and integer arithmetic.

Some platforms, such as 64-bit GCC, provide an __int128 abstraction, but other platforms do not. MSVC and all 32-bit targets do not have such extensions. For this reason, this PR implements 128-bit arithmetic using only standard C99.

Usage looks like this:

  • mlib_int128 a typedef of a trivial type that occupies 16 bytes of storage. There is no distinction between signed/unsigned: Operations that depend on sign (for now, just integer comparison) require specifying whether you want a signed compare or an unsigned compare.
  • MLIB_INT128(N) a macro that creates a 128-bit integer from an integer literal. The literal should not have a suffix. Negation within the macro "just works" (but may generate a false-positive compiler warning).
  • MLIB_INT128_CAST(N) cast an arbitrary integral value N to a 128-bit integer.
  • MLIB_INT128_FROM_PARTS(Low64, High64) the actual "constructor" for mlib_in128: Glues together the low bits and the high bits from two 64-bit integers.
  • MLIB_INT128_SMAX, MLIB_INT128_SMIN, MLIB_INT128_UMAX: The signed-maximum, signed-minimum, and unsigned-maximum values for 128-bit integers.

The following functions all have an mlib_int128_ prefix:

  • ucmp(L, R) -> int and scmp(L, R) -> int Compare two integers as either unsigned or signed, respectively. Returns n < 0, n > 0, or n = 0 for less, greater, and equal, respectively.
  • eq(L, R) -> bool - Determine whether L and R are equal.
  • add(L, R) -> int128 - Return L + R. Overflow wraps.
  • sub(L, R) -> int128 - Returns L - R. Overflow wraps.
  • negate(N) -> int128 - Treat N as a signed integer. Return -N.
  • lshift(N, c) -> int128 - Return N << c
  • rshift(N, c) -> int128 - Return N >> c
  • bitor(L, R) -> int128 - Return L | R
  • mul(L, R) -> int128 - Return L * R. Overflow wraps.
  • div(L, R) -> int128 - Treats L and R as unsigned: Return L / R (round toward zero).
  • mod(L, R) -> int128 - Treats L and R as unsigned: Return L % R (remainder after integer division).
  • divmod(L, R)-> {quotient, remainder} - Obtain both the quotient and remainder of an integer division.
  • pow10(N) -> int128 - Obtain the Nth power of ten.
  • pow2(N) -> int128 - Obtain the Nth power of two. (No general pow() is defined.)
  • from_string(S) -> int128 - Parse S as a string of decimal digits.
  • to_u64(N) -> uint64_t - Return the low 64 bits of N.
  • format(N) - Render the N as a string of decimal digits. Returns a struct containing the result array.

The implementations of add, sub, the bitshifts, to/from string, compare, and equality are all fairly straightforward extensions of 64-bit operations (with carry/borrow).

Multiplication here is implemented using Knuth's 4.3.1M multiplication algorithm (from The Art of Computer Programming). Here 4.3.1M is used to multiply the two low 64-bit words to obtain the 128-bit product. The high word of the result is then adjusted by two more regular 64-bit multiplications.

Division is the most complicated operation and took a lot of testing to ensure correct. It uses Knuth's 4.3.1D algorithm for arbitrary precision unsigned division (defined in _mlibKnuth431D). Certain optimizations and shortcuts were learned from the existing STL constexpr implementation. Future optimization to use actual intrinsics and hardware are not done here, since we just need these 128-bit integers and want platform-equivalence.

Most of the code has been marked as constexpr, and a lot of functionality is tested via static_asserts for failing CI as-fast-as-possible and immediate IDE feedback.


The PR tests are implemented in C++ and defined a test macro CHECK, since I miss Catch2 🥲.

@vector-of-bool vector-of-bool marked this pull request as ready for review December 6, 2022 19:19
@kevinAlbs kevinAlbs requested a review from kkloberdanz December 6, 2022 19:22
* @brief Bitwise-or two 128-bit integers
*/
static mlib_constexpr_fn mlib_int128
mlib_int128_bitor (mlib_int128 l, mlib_int128 r)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find a test for this one. Perhaps we could have a test for it?

* @brief Multiply two mlib_int128s together. Overflow will wrap.
*/
static mlib_constexpr_fn mlib_int128
mlib_int128_mul (mlib_int128 l, mlib_int128 r)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice a test for this function. Perhaps we could add a test?

static mlib_constexpr_fn mlib_int128
_mlibUnsignedMult128 (uint64_t left, uint64_t right)
{
// Perform a Knuth 4.2.1M multiplication

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a minor typo? In the PDF of TAOCP that I have, I see this algorithm is listed as 4.3.1M.

Comment on lines +277 to +279
/// Implementation of Knuth's algorithm 4.3.1 D for unsigned integer division
static mlib_constexpr_fn void
_mlibKnuth431D (uint32_t *const u,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the explicit citations here! D1, D2, etc

Copy link

@kkloberdanz kkloberdanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Looks good to me. I really appreciate that you cited which algorithms you used. Good job!

@kkloberdanz
Copy link

As an aside, GNU GMP (https://gmplib.org/) has highly optimized implementations for arbitrary precision integer arithmetic. For simple use cases, I see the appeal of not needing to pull in a new dependency on GMP, but if more complex use cases come up for high precision integer arithmetic, it may be good to evaluate GMP for such a use case.

Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work.

I suggest more test cases for multiplication and division. Consider using code coverage or fuzzing to ensure branches are exercised. An incorrect result may lead to data corruption.

Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions and questions remaining; otherwise, LGTM.

CHECK (mlib_int128_mul (MLIB_INT128_CAST (-7), MLIB_INT128_CAST (-7)) ==
49_i128);

// It's useful it specify bit patterns directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It's useful it specify bit patterns directly
// It's useful to specify bit patterns directly.

// division checks. It doesn't need to be rigorous or optimal, it only
// needs to "just work."
std::vector<std::thread> threads;
threads.resize (15);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest the following pattern instead:

std::vector<std::thread> threads;
for (...) {
  threads.emplace_back([nbits, dbits, random] () mutable { ... });
}
for (auto& t : threads) {
  t.join();
}


#include <iostream>
#include <random>
#include <string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <string>
#include <string>
#include <vector>

mlib_int128_format (mlib_int128 i)
{
mlib_int128_charbuf into = {0};
char *out = into.str + sizeof into - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char *out = into.str + sizeof into - 1;
char *out = into.str + sizeof(into) - 1u;

Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. LGTM

const u64 n0 = (u32) (numer.r.lo);
const u64 n1 = (u32) (numer.r.lo >> 32);

// We don't need to split n2 and n3. (n3,n2) will be the first parital
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We don't need to split n2 and n3. (n3,n2) will be the first parital
// We don't need to split n2 and n3. (n3,n2) will be the first partial

}
}

// Denomralization (D8) is done by caller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Denomralization (D8) is done by caller.
// Denormalization (D8) is done by caller.

const int vlen,
uint32_t *quotient)
{
// Part D1 (normalization) is done by caller, normalized in u and v (b is 32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Part D1 (normalization) is done by caller, normalized in u and v (b is 32)
// Part D1 (normalization) is done by caller, normalized in u and v (b is 2^32)

b is the radix. The digits are 32 bit unsigned integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants