[MONGOCRYPT-483]: Support Decimal128 in range-based Queryable Encryption #522

vector-of-bool · 2022-12-20T02:13:29Z

The long-awaited PR is here! This one took a lot more than expected, especially because it involves the manipulation and handling of data types that the language doesn't support.

The commits have been roughly arranged in a "somewhat sensible" narrative order, but it wasn't completely untangle-able. For that reason, here I'll summarize the main aspects of the changes, in the order than they will make the most sense.

(The changeset is complete, but this write-up is still WIP)

Part 0: 128-bit Integer Arithmetic

This changeset originally included adding support for 128-bit integer arithmetic, which is required by some operations. This work was significant enough that it was split into a separate PR: #510

Part 1: IntelDFP: A Decimal128 Library

First we needed a way to perform operations on 128-bit IEEE754 decimal floating point numbers. While libbson containts a bson_decimal128_t, it's just a bundle of 16 bytes, and can't be used for any arithmetic without additional support.

The choice was made to pull an appropriate Decimal128 software implementation. The MongoDB server uses Intel's DFP (decimal floating point) library, and as such we had experience with this library before. The server build uses SCons to construct and link the IDFP library. We use CMake, so there was some amount of "translation" necessary.

The IDFP library is pulled by cmake/IntelDFP.cmake. This module uses FetchContent to download and extract an archive of the library (by default, the same one used in the server). The URL of the archive can be tweaked as a configuration option, which can be used to support the "airgapped build" requirements of distribution packagers.
The extracted archive needs to be tweaked to support s390x! Fortunately, the server repository contains a commit that modifies the IDFP library sufficient to get it working on s390x. The relevant changes from the server repository are saved in a patch file at etc/mongo-inteldfp-s390x.patch. The FetchContent operation uses Git to apply this patch to the extracted archive.
The IDFP library contains symbol definitions that are commonly wanted by other runtimes and libraries. This could cause conflict if libmongocrypt contains these symbols and they are also being defined by a language runtime or external instance of IDFP. To "hide" these symbols, IDFP provides a very blunt hammer that allows the user to forcibly rename all of the external-linkage entities in IDFP. The CMake configuration uses a regex-find-replace to qualify all IDFP entities with a libmongocrypt-specific name.
The IntelDFP build process is an absolute traveshamockery, and there is no attempt to wrangle it in the server's SCons build, so this changeset follows suit:
1. A select subset of the IDFP source files are selected and added to a CMake OBJECT library.
2. The IDFP build requires a lot of odd preprocessor macros to be defined. The conditionality of each is duplicated directly from the server's SCons build.
3. The translation units from the object library are attached to an INTERFACE library that provides a level of indirection between library consumers.
4. Within the libmongocrypt build, users of the interface library will have the translation units from IDFP injected directly into their own link/ar output.
5. When the interface library is installed, the object file dependencies are stripped away, and the interface library only supplies the link options required to use the IDFP translation units that are already inside of libmongocrypt. Because the TUs from IDFP are inside of libmongocrypt, we do not need to install the IDFP build results.
The IDFP entities are not exposed via #include. Instead, the bare minimum entities are declared as extern in the immediate scope where they are used within the Decimal128 wrapper.

Part 2: A Decimal128 Wrapper

The IDFP library is very low-level, and we don't need a lot of the details therein. Additionally, we need functionality that is not exposed by the IDFP library. Every problem can be solved with an additional level of indirection, so that's what we have here.

mc-dec128.h defines a new trivial datatype mc_dec128, which is the sixteen byte encoding a decimal number. This type is representation-identical and layout-identical to the types used in IDFP, so it is passed directly to/from the IDFP library.

The server contains its own Decimal128 wrapper, and defines several extensions that are required for the implementation of the Decimal128 range encoding. Equivalent constants and extensions are available in mc-dec128.h, including:

MC_DEC128_LARGEST_NEGATIVE
MC_DEC128_LARGEST_POSITIVE
MC_DEC128_SMALLEST_NEGATIVE
MC_DEC128_SMALLEST_POSITIVE,
MC_DEC128_NORMALIZED_ZERO
MC_DEC128_POSITIVE_INFINITY
MC_DEC128_NEGATIVE_INFINITY
MC_DEC128_POSITIVE_NAN
MC_DEC128_NEGATIVE_NAN

These constants are constructed using a high/low pair of 64-bit words. The bit patterns within IEEE754 128-bit binary-integer-decimal are (high-to-low): 1 sign bit, 16 "combination" bits, and 111 "coefficient/mantissa/significant/etc." bits (herein referred to as the "coefficient" $\it Coeff$). The coefficient is an unsigned 111 bit integer. The interpretation of the "combination" bits varies depending on the leading bit pattern. For most of this changeset, we do not care about NaN and Infinity, so we are only concerned with "canonical" combinations, wherein the biased exponent $\it Exp$ of the number is encoded in the combination.

$\it Exp$ is a 13-bit unsigned integral value. Importantly, it does not encode the actual exponent, but a "biased" exponent value. If $\it Exp$ is $0$, this does not mean the actual exponent is $0$, but rather negative bias value. In this case, the bias is $6176$, so the $0$ biased $\it Exp$ corresponds to a real exponent of $-6176$. (This gets more confusing since some calculations must take into account the 34 significant digits encoded in $\it Coeff$ Don't think about it too hard, it's a bit confusing and mostly irrelevant here.)

All together, the true value of the a Decimal128 is computed as ${\it Coeff} \times 10^{{\it Exp} - 6176}$.

Part 3: OST Encoding

The purpose of the OST encoding is to map a value in a totally ordered type into a string of bits that is also totally ordered, wherein the mapping maintains the total ordering. The OST mapping of Decimal128 is similar to the mapping used for double-precision 64-bit floating point values. NaN and Infinity values are not supported in the mapping. All Decimal128 representations that correspond to real zero are considered equivalent (sign and exponent of zero are ignored). The encoding for Decimal128 is implemented in mc_getTypeInfoDecimal128. It accepts parameters $\it min$, $\it max$, and $\it precision$, and encodes a Decmal128 value $V$ as a 128-bit unsigned integer.

When encoding a value $V$:

Let $Q_{max} \coloneqq 2^{128}-1$
If $\it min$, $\it max$, and $\it precision$ are all set for the encoding parameters:
- If ${\it min} \leq V < {\it max}$ is not true, generate an error.
- Let ${\it bounds} \coloneqq ({\it max} - {\it min}) + 1$.
- Let ${\it bounds'} \coloneqq {\it bounds} \times 10^{\it precision}$.
- Let ${\it bits\_range} \coloneqq \log_2{\it bounds'}$ (This is the number of bits required to uniquely encode any Decimal128 within in the range $[{\it min}, {\it max}]$ with the requested precision.)
- Let ${\it use\_precision\_mode} \coloneqq {\it bits\_range} < 128$ (This indicates whether we can represent $V$ with reduced precision with a direct mapping of the values over the range $[{\it min}, {\it max}]$ into the domain of 128-bit unsigned integers.)
Otherwise, let ${\it use\_precision\_mode} \coloneqq \it false$
Let $C$ be the coefficient of $V$.
If $\it use\_precision\_mode$ is $\it true$:
- Let $V' \coloneqq \lfloor V \times 10^{\it precision} \rfloor \div 10^{\it precision}$ (This operation truncates the number of significant digits based on the requested precision.)
- Let $V'' \coloneqq (V' - {\it min}) \times 10 ^{\it precision}$
- Let $O_{max} \coloneqq 2^{\it bits\_range} - 1$
- If ${\it bits\_range} < 64$:
  - We need even fewer bits to form the mapped value. Attempt to convert $V''$ to a 64-bit unsigned integer $L$
  - If $L$ was computed successfully without any overflow or loss of precision:
    - The return value of mc_getTypeInfoDecimal128 is $\{{\tt .value} \coloneqq L, {\tt .min} \coloneqq 0, {\tt .max} = O_{max}\}$
  - Otherwise, fall through:
- Let $E_u$ be the unbiased exponent of $V$.
- Let $Q \coloneqq C \times 10^{E_u}$ (this may divide if $E_u$ is negative)
- The return value of mc_getTypeInfoDecimal128 is $\{{\tt .value} \coloneqq Q, {\tt .min} \coloneqq 0, {\tt .max} = O_{max}\}$
Otherwise (i.e. $\it use\_precision\_mode$ is $\it false$):
If $C = 0$, the return value of mc_getTypeInfoDecimal128 is $\{{\tt .value} \coloneqq 2^{127}, {\tt .min} \coloneqq 0, {\tt .max} = Q_{max}\}$
Let $C_{max} \coloneqq 10^{34} - 1$ (the maximum coefficient value, as $\it Coeff$ encodes 34 decimal digits.
Let $E_b$ be the biased exponent of $D$. (This will always be a non-negative integer.)
Let $\rho$ be the greatest integer such that $C \times 10^\rho \leq C_{max}$ ( $\rho$ will never be negative.)
If $\rho \leq E_b$: Let $Q' \coloneqq C \times 10^\rho + C_{max} \times (E_b - \rho)$
Otherwise: Let $Q' \coloneqq C \times 10^{E_b}$
Let $Q = (Q' + 2^{127}) \times {\rm sign}(V)$
The return value of mc_getTypeInfoDecimal128 is $\{{\tt .value} \coloneqq Q, {\tt .min} \coloneqq 0, {\tt .max} = Q_{max}\}$

Part 4: Endian handling

A new header was added: <mlib/endian.h>, which only provides bare-bones compile-time endian detection (no confgure-time support needed).

Code is written to be endian-neutral as often as possible. Within this PR, the only point it is relevant is when selecting the hi/lo 64-bit words of the Decimal128 representation. The _mcDec128ConstFromParts(CoeffLow, HighWord) macro permutes the two arguments when setting the word values of a Decimal128 literal.

Part 5: Test Vector Translation

The test vector data is pulled directly from the server C++ codebase and translated into a C-compatible representation using a Python script in test/data/range-edge-generation/make_includes.py. The script implements an extremely bare-bones C++ expression parser that can consume the test vectors from the server and emit equivalent code for use in the libmongocrypt tests. The script itself is only invoked manually when one wishes to update the test vector data.

Part 6: Generalizing the Mincover Templates

The file mc-range-mincover-generator.template.h is #included by a C source file to define the MinCover implementation for a particular data type. The prior version only supported specifying the number of bits, and relied on language builtins for all other operations. Because we need to handle int128, and the language does not have built-in support for this type, the template needed to be generalized further to allow control of the following types and operations:

UINT_T defines the type of the integer for MinCover.
UINT_C(X) must expand to a literal of UINT_T with the given equivalent value of X.
UINT_FMT_ARG(X) (optional) must accept a value of type UINT_T and return a printf-formattable object F. If undefined, simply returns X.
UINT_FMT_S must expand to a string literal that will be spliced in for the printf format specifier for UINT_FMT_ARG(X) for some X of type UINT_T.
DECORATE_NAME(X) must expand to an new C identifier based on X.
UINT_LESSTHAN(A, B) (optional) for UINT_T values A and B must expand to a boolean-convertible expression that is true if A is strictly less-than B. The default expands to ((A) < (B)). All other comparisons are implemented in terms of less-than.
MC_UINT_MAX (optional) must expand to the an expression equivalent to maximum value of UINT_T. Default is ~ UINT_C(0)
UINT_ADD(A, B) (optional) must expand to the addition result of A + B. Default is ((A) + (B))
UINT_SUB(A, B) (optional) must expand to the subtraction of A - B. Default is ((A) - (B)).
UINT_LSHIFT(A, B) (optional) for UINT_T A and int B must expand to the value of A that has been left-shifted by B bits. If B is negative, should apply a right-shift of -B. Default uses built-in bit-shifting.
UINT_BITOR(A, B) (optional) for UINT_T A, B must expand to the a UINT_T value which is bitwise-or of A with B. Defalut is ((A) | (B))

The values of these macros is straight-forward for built-in uint32_t and uint64_t. For mlib_int128, the definitions are:

UINT_T = mlib_int128
UINT_C = MLIB_INT128
UINT_FMT_S = "s"
UINT_FMT_ARG(X) = (mlib_int128_format(X).str) (This expands to a pointer-to-char[], thus formatted with "%s")
DECORATE_NAME(N) = N##_u128
UINT_LESSTHAN(L, R) = (mlib_int128_ucmp(L, R) < 0)
UINT_ADD = mlib_int128_add
UINT_SUB = mlib_int128_sub
UINT_LSHIFT = mlib_int128_lshift
MC_UINT_MAX = MLIB_INT128_UMAX
UINT_BITOR = mlib_int128_bitor

The CHECK_BOUNDS macro in mc-range-mincover.c was similarly modified to accept a FormatArg and LessThan parameter, which have the same purpose as UINT_FMT_ARG and UINT_LESSTHAN, respectively.

This removes the `find_cmake.sh` script and usage thereof within the Node build. It now uses the same CMake scripts (and CMake version) as the rest of the build.

- Download and build IntelDFP as part of CMake configuration. - Modify the CHECK macro to be defined in a separate file to be reusable between tests. Clean up DFP build, and patch for s390x

- Tweak detection of Decimal128 intrin - Fix some conversion/signedness quirks - Suppress DFP warnings. It generates many, and we don't want them.

rcsanchez97 · 2023-01-12T02:06:04Z

@rcsanchez97 adding you to this because of the addition of the Intel DFP library. At the moment, it downloads the source archive at configure-time and then builds it in-tree with our flags. If we want to support an airgapped build for package source distribution, we'll likely need an alternative design. It may be as simple as including the archive in the repository, but I'll as your thoughts on this.

So, the main concern I have is with the need to patch DFP. It appears that DFP is available in Debian and Ubuntu, so if there is a way to link to an already built DFP (like we might do with libssl) then that would be ideal. However, the DFP packages in Debian and Ubuntu lack the s390x patch that your PR requires. In any event, I have only read your narrative description of the changes, so I will need to look at the way you implemented the build system changes.

Also, to make sure I am not causing confusion, note that I am referring to concerns that I have for official packages which are uploaded directly to Debian and Ubuntu. The official Fedora packages will certainly have a significant problem as well. For our own Evergreen-built packages which we publish in the PPA, we are probably OK with your approach.

Related-ish: The current Debian package build fails due to some issue with extracting/patching the IntelDFP sources (the files don't appear/appear in the wrong location?). I'm unable to reproduce this issue locally (it "just works" for me), so I'm curious if you might have a guess as to what might be different in the EVG environment. If not, I can connect to a host and poke at it manually.

Can you provide a link to an example EVG failure? (DM via slack is fine.) I have an idea of what might be the issue, but I'd want to confirm by looking at the build output.

cmake/IntelDFP.cmake

src/mc-range-encoding.c

CMakeLists.txt

kevinAlbs · 2023-01-17T16:49:30Z

cmake/IntelDFP.cmake

+include (FetchContent)
+find_program (GIT_EXECUTABLE git)
+
+set (_default_url "https://netlib.org/misc/intel/IntelRDFPMathLib20U2.tar.gz")


A build failure on the C driver was observed related to downloading this tarball:

https://spruce.mongodb.com/task/mongo_c_driver_arm_ubuntu1804_debug_compile_sasl_openssl_cse_patch_a8560d7488778e480b7c61a0f8931ddbcc310c7d_63c6c96fd6d80a4e2436bcd1_23_01_17_16_14_40/logs?execution=1

[2023/01/17 16:25:00.937] -- Using src='https://netlib.org/misc/intel/IntelRDFPMathLib20U2.tar.gz'; [2023/01/17 16:25:00.937] CMake Error at intel_dfp-subbuild/intel_dfp-populate-prefix/src/intel_dfp-populate-stamp/download-intel_dfp-populate.cmake:170 (message): [2023/01/17 16:25:00.937] Each download failed! [2023/01/17 16:25:00.937] error: downloading 'https://netlib.org/misc/intel/IntelRDFPMathLib20U2.tar.gz'; failed [2023/01/17 16:25:00.937] status_code: 28 [2023/01/17 16:25:00.937] status_string: "Timeout was reached" [2023/01/17 16:25:00.937] log: [2023/01/17 16:25:00.937] --- LOG BEGIN --- [2023/01/17 16:25:00.937] Trying 160.36.131.221:443... [2023/01/17 16:25:00.937] connect to 160.36.131.221 port 443 failed: Connection timed out [2023/01/17 16:25:00.937] Failed to connect to netlib.org port 443 after 129521 ms: Couldn't connect [2023/01/17 16:25:00.937] to server [2023/01/17 16:25:00.937] Closing connection 0 [2023/01/17 16:25:00.937] --- LOG END --- [2023/01/17 16:25:00.937] CMakeFiles/intel_dfp-populate.dir/build.make:99: recipe for target 'intel_dfp-populate-prefix/src/intel_dfp-populate-stamp/intel_dfp-populate-download' failed [2023/01/17 16:25:00.937] make[2]: *** [intel_dfp-populate-prefix/src/intel_dfp-populate-stamp/intel_dfp-populate-download] Error 1 [2023/01/17 16:25:00.937] CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/intel_dfp-populate.dir/all' failed [2023/01/17 16:25:00.937] make[1]: *** [CMakeFiles/intel_dfp-populate.dir/all] Error 2 [2023/01/17 16:25:00.937] Makefile:90: recipe for target 'all' failed [2023/01/17 16:25:00.937] make: *** [all] Error 2 [2023/01/17 16:25:00.937] CMake Error at /home/ubuntu/.cache/libmongocrypt/build.1/cmake/3.25.1/share/cmake-3.25/Modules/FetchContent.cmake:1616 (message): [2023/01/17 16:25:00.937] Build step for intel_dfp failed: 2 [2023/01/17 16:25:00.937] Call Stack (most recent call first): [2023/01/17 16:25:00.937] /home/ubuntu/.cache/libmongocrypt/build.1/cmake/3.25.1/share/cmake-3.25/Modules/FetchContent.cmake:1756:EVAL:2 (__FetchContent_directPopulate) [2023/01/17 16:25:00.937] /home/ubuntu/.cache/libmongocrypt/build.1/cmake/3.25.1/share/cmake-3.25/Modules/FetchContent.cmake:1756 (cmake_language) [2023/01/17 16:25:00.937] cmake/IntelDFP.cmake:46 (FetchContent_Populate) [2023/01/17 16:25:00.937] CMakeLists.txt:28 (include)

A subsequent run on that variant passed. If the download may be unreliable, can the source be copied instead of downloaded?

test/test-mc-range-mincover.c

eramongodb · 2023-01-17T18:28:46Z

test/test-mongocrypt-ctx-encrypt.c

+   _test_encrypt_fle2_encryption_placeholder (
+      tester, "fle2-insert-range/decimal128-precision", &source);


I observe the following UBSAN runtime error during execution of _test_encrypt_fle2_insert_range_payload_decimal128_precision:

./local-build/_deps/intel_dfp-src/LIBRARY/float128/dpml_ux_ops.c:770:21: runtime error: left shift of negative value -1 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ./local-build/_deps/intel_dfp-src/LIBRARY/float128/dpml_ux_ops.c:770:21 in

with stack trace:

#0 __dpml_bid_pack__ (unpacked_result=0x7ffffff8cd40, packed_result=0x7ffffff8d040, underflow_error=0, overflow_error=0, __dpml_bid_ux_excptn_info__=0x7ffffff8cea0) at _deps/intel_dfp-src/LIBRARY/float128/dpml_ux_ops.c:770 #1 0x0000000000847093 in __dpml_bid_C_ux_log__ (packed_argument=0x7ffffff8d020, class_to_action_map=0x1335578 <__dpml_bid_log_x_table+24>, scale=0x0, packed_result=0x7ffffff8d040, __dpml_bid_ux_excptn_info__=0x7ffffff8cea0) at _deps/intel_dfp-src/LIBRARY/float128/dpml_ux_log.c:170 #2 0x0000000000847297 in bid_f128_log2 (packed_result=0x7ffffff8d040, packed_argument=0x7ffffff8d020) at _deps/intel_dfp-src/LIBRARY/float128/dpml_ux_log.c:205 #3 0x0000000000c9699b in __mongocrypt_bid128_log2 (bid_x=..., rnd_mode=0, pfpsf=0x7ffffff8d980) at _deps/intel_dfp-src/LIBRARY/src/bid128_log2.c:123 #4 0x00000000008f25a3 in mc_dec128_log2_ex (operand=..., flags=0x0) at ./src/mc-dec128.h:281 #5 0x00000000008eb8ed in mc_dec128_log2 (operand=...) at ./src/mc-dec128.h:288 #6 0x00000000008e5db2 in mc_getTypeInfoDecimal128 (args=..., out=0x7ffffff8f550, status=0x60300010ca50) at ./src/mc-range-encoding.c:434 #7 0x00000000008bf8c2 in mc_getEdgesDecimal128 (args=..., status=0x60300010ca50) at ./src/mc-range-edge-generation.c:221 #8 0x00000000009f5ec5 in get_edges (insertSpec=0x7ffffff903f0, sparsity=1, status=0x60300010ca50) at ./src/mongocrypt-marking.c:717 #9 0x00000000009e9914 in _mongocrypt_fle2_placeholder_to_insert_update_ciphertextForRange (kb=0x61c000067098, marking=0x7ffffff91420, ciphertext=0x7ffffff910c0, status=0x60300010ca50) at ./src/mongocrypt-marking.c:856 #10 0x00000000009e6105 in _mongocrypt_marking_to_ciphertext (ctx=0x61c000067098, marking=0x7ffffff91420, ciphertext=0x7ffffff910c0, status=0x60300010ca50) at ./src/mongocrypt-marking.c:1579 #11 0x00000000009745d9 in _marking_to_bson_value (ctx=0x61c000067098, marking=0x7ffffff91420, out=0x7ffffff91690, status=0x60300010ca50) at ./src/mongocrypt-ctx-encrypt.c:1118 #12 0x0000000000974104 in _replace_marking_with_ciphertext (ctx=0x61c000067098, in=0x7ffffff91640, out=0x7ffffff91690, status=0x60300010ca50) at ./src/mongocrypt-ctx-encrypt.c:1178 #13 0x0000000000a01081 in _recurse (state=0x7ffffff91fa0) at ./src/mongocrypt-traverse-util.c:82 #14 0x0000000000a0247a in _recurse (state=0x7ffffff925d0) at ./src/mongocrypt-traverse-util.c:154 #15 0x0000000000a01bbd in _recurse (state=0x7ffffff92cc0) at ./src/mongocrypt-traverse-util.c:123 #16 0x0000000000a004e5 in _mongocrypt_transform_binary_in_bson (cb=0x973de0 <_replace_marking_with_ciphertext>, ctx=0x61c000067098, match=TRAVERSE_MATCH_MARKING, iter=0x7ffffff93220, out=0x7ffffff92fa0, status=0x60300010ca50) at ./src/mongocrypt-traverse-util.c:199 #17 0x000000000096f72a in _fle2_finalize (ctx=0x61c000067080, out=0x6020000ce8f0) at ./src/mongocrypt-ctx-encrypt.c:1607 #18 0x000000000095fa88 in _finalize (ctx=0x61c000067080, out=0x6020000ce8f0) at ./src/mongocrypt-ctx-encrypt.c:1958 #19 0x000000000098f7dd in mongocrypt_ctx_finalize (ctx=0x61c000067080, out=0x6020000ce8f0) at ./src/mongocrypt-ctx.c:699 #20 0x00000000006fba7f in _test_encrypt_fle2_encryption_placeholder (tester=0x7ffffff957e0, data_path=0x130bc60 <str> "fle2-insert-range/decimal128-precision", rng_source=0x7ffffff953b0) at ./test/test-mongocrypt-ctx-encrypt.c:2508 #21 0x00000000006f3ff2 in _test_encrypt_fle2_insert_range_payload_decimal128_precision (tester=0x7ffffff957e0) at ./test/test-mongocrypt-ctx-encrypt.c:2641 #22 0x0000000000830665 in main (argc=1, argv=0x7fffffffdac8) at ./test/test-mongocrypt.c:1026

and locals:

tmp = {sign = 0, exponent = 1079033856, fraction = {56, 8}} tmp_rec = {func_error_code = 140737487882864, context = 0x7ffffff8ca68, platform_specific_err_code = 140737487882864, environment = 140737487882856, ret_val_ptr = 0x7ffffff8ca60, name = 0x10100002b2f4541 <error: Cannot access memory at address 0x10100002b2f4541>, data_type = 6 '\006', dpml_error = 0 '\000', mode = 0 '\000', ret_val = {w = 140737487881828, f = -nan(0x78c664), d = 6.953355784440996e-310, ld = <invalid float value>}, args = {{w = 140737487883592, f = -nan(0x78cd48), d = 6.9533557845281492e-310, ld = <invalid float value>}, {w = 140737488355327, f = -nan(0x7fffff), d = 6.9533558078349549e-310, ld = <invalid float value>}, {w = 140737487882848, f = -nan(0x78ca60), d = 6.9533557844913907e-310, ld = <invalid float value>}, {w = 140737487883588, f = -nan(0x78cd44), d = 6.9533557845279515e-310, ld = <invalid float value>}}} shift = 0 error_code = 140737487882816 x_ptr = 0x7ffffff8ca40 exponent = 7 incr = 17594333436232 tmp_digit = 140737487883520 next_digit = 140737487883524 current_digit = 130910785088 error_val_ptr = 0x7ffffff8cd00

corresponding to the following test input (args for mc_getEdgesDecimal128):

{ value = {_words = {456000000000000, 3468897612982124544}}, sparsity = 1, min = {set = true, value = {_words = {0, 3476778912330022912}}}, max = {set = true, value = {_words = {1234567890123456789, 3476778912330022912}}}, precision = {set = true, value = 2} }

IDFP conatins this exacte line:

#define UX_ZERO_EXPONENT (- (UX_EXPONENT_TYPE) 1 << (F_EXP_WIDTH + 2))

The DFP library quality is... something.

I don't see the same runtime error on my system. It seems like my compiler sees this, rolls its eyes and constant-folds/ignores it.

I am still concerned about the UBSAN errors. Can we manually address these instances of left-shift UB by including it in the set of patches applied to IntelDFP?

What platform are you using? I'll try to repro the error and find a suitable fix.

Ubuntu 20.04 (in WSL2):

$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal $ clang --version clang version 10.0.0-4ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

The only special configuration flags required should be -fsanitize=undefined.

test/test-mc-range-mincover.c

test/test-mc-range-edge-generation.c

src/mc-dec128.h

src/mc-optional-private.h

test/test-mc-range-encoding.c

src/mc-range-edge-generation-private.h

src/mc-dec128.test.cpp

Add missing _ex variants for dec128 functions. Remove flag definitions, since we don't care about those at the moment. No use for opt_int128. Other comments/spelling.

kevinAlbs

Code changes LGTM. Fantastic work.

I suggest including the IntelDFP source, rather than downloading. But I am OK with addressing that in a subsequent PR to expedite merging.

kkloberdanz

LGTM!

eramongodb

One outstanding concern; otherwise, LGTM.

eramongodb · 2023-01-18T15:21:56Z

test/test-mongocrypt-ctx-encrypt.c

+   _test_encrypt_fle2_encryption_placeholder (
+      tester, "fle2-insert-range/decimal128-precision", &source);


I am still concerned about the UBSAN errors. Can we manually address these instances of left-shift UB by including it in the set of patches applied to IntelDFP?

vector-of-bool and others added 17 commits December 20, 2022 21:30

Tweak Node scripts to use top-level scripts

4244864

This removes the `find_cmake.sh` script and usage thereof within the Node build. It now uses the same CMake scripts (and CMake version) as the rest of the build.

Basic IntelDFP wrapper, and test tweaks:

8c82925

- Download and build IntelDFP as part of CMake configuration. - Modify the CHECK macro to be defined in a separate file to be reusable between tests. Clean up DFP build, and patch for s390x

More decimal constants, operations, comments

b7cfe11

Define the OST encoding for Decimal128

9a15b80

Fix warnings, tweaks, compiler compat.

2c95f31

- Tweak detection of Decimal128 intrin - Fix some conversion/signedness quirks - Suppress DFP warnings. It generates many, and we don't want them.

Fix big-endian support.

11edf84

Tweak bitstring generation and inline clz operations

747b793

Bit operations for u128

ea6c8f2

Edge generation for Decimal128

de2e9ab

Modify mincover template to support non-builting types

8ffe378

mincover test vector data

e7cd533

New mincover impl for Decimal128

4aa3849

add decimal128 tests

5764068

Implement range-insert for decimal128, fix test data

47e4b78

Implement find-range, more test data fixes

976897f

Fix new conversion warning-errors

e63daff

Fix new conversion warnings and brace warnings

9665927

vector-of-bool force-pushed the decimal128 branch from b244efc to 9665927 Compare December 20, 2022 21:30

vector-of-bool added 5 commits December 20, 2022 21:47

spelling error in IDFP download

f9a0406

Clang-tidy warnings

d866db2

A safer strncpy ensures we write a null terminator

41e5660

Dead/dup code in range encoding

bb8b702

Python cleanup

16374c4

vector-of-bool marked this pull request as ready for review January 5, 2023 04:19

vector-of-bool requested review from addaleax, baileympearson, dariakp, durran and nbbeeken as code owners January 5, 2023 04:19

vector-of-bool removed the request for review from durran January 5, 2023 04:19

kevinAlbs mentioned this pull request Jan 12, 2023

DRIVERS-2286 add tests for Decimal mongodb/specifications#1364

Merged

3 tasks

kevinAlbs reviewed Jan 17, 2023

View reviewed changes

cmake/IntelDFP.cmake Show resolved Hide resolved

kevinAlbs reviewed Jan 17, 2023

View reviewed changes

src/mc-range-encoding.c Outdated Show resolved Hide resolved

kevinAlbs reviewed Jan 17, 2023

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

kevinAlbs reviewed Jan 17, 2023

View reviewed changes

eramongodb reviewed Jan 17, 2023

View reviewed changes

test/test-mc-range-edge-generation.c Show resolved Hide resolved

vector-of-bool added 4 commits January 17, 2023 20:07

Tautalogical compare

2524124

Tweak and comment on array size limits

812dde1

Bump CMake min-required

ec7663a

null-safe checks in string building

8c7fa49

vector-of-bool requested review from eramongodb, kevinAlbs and kkloberdanz January 17, 2023 21:00

eramongodb reviewed Jan 17, 2023

View reviewed changes

vector-of-bool added 3 commits January 17, 2023 22:07

Merge branch 'master' into decimal128

5126d16

PR comments/cleanup:

f4da51d

Add missing _ex variants for dec128 functions. Remove flag definitions, since we don't care about those at the moment. No use for opt_int128. Other comments/spelling.

Fix formatting arg

619d64b

kevinAlbs approved these changes Jan 18, 2023

View reviewed changes

kevinAlbs mentioned this pull request Jan 18, 2023

MONGOCRYPT-483 make CMAKE_BUILD_TYPE configurable. #532

Merged

kkloberdanz approved these changes Jan 18, 2023

View reviewed changes

eramongodb approved these changes Jan 18, 2023

View reviewed changes

Fix UB in IntellRDFP lib

261934a

vector-of-bool merged commit b704947 into mongodb:master Jan 18, 2023

This was referenced Jan 25, 2023

CDRIVER-3620 Audit compile and test scripts mongodb/mongo-c-driver#1187

Merged

CDRIVER-4394 add decimal128 Range Index tests mongodb/mongo-c-driver#1192

Closed

CDRIVER-4394 add decimal128 Range Index tests mongodb/mongo-c-driver#1198

Merged

kevinAlbs mentioned this pull request Jul 9, 2024

MONGOCRYPT-702 regenerate edge and minCover test data #856

Merged

eramongodb mentioned this pull request Aug 12, 2025

MONGOCRYPT-839 set CMake minimum required version to 3.15...4.0 #1052

Merged

		_test_encrypt_fle2_encryption_placeholder (
		tester, "fle2-insert-range/decimal128-precision", &source);

[MONGOCRYPT-483]: Support Decimal128 in range-based Queryable Encryption #522

[MONGOCRYPT-483]: Support Decimal128 in range-based Queryable Encryption #522

Uh oh!

Conversation

vector-of-bool commented Dec 20, 2022 • edited by kevinAlbs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Part 0: 128-bit Integer Arithmetic

Part 1: IntelDFP: A Decimal128 Library

Part 2: A Decimal128 Wrapper

Part 3: OST Encoding

Part 4: Endian handling

Part 5: Test Vector Translation

Part 6: Generalizing the Mincover Templates

Uh oh!

rcsanchez97 commented Jan 12, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinAlbs Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eramongodb Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

vector-of-bool Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

eramongodb Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

vector-of-bool Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

eramongodb Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevinAlbs left a comment

Choose a reason for hiding this comment

Uh oh!

kkloberdanz left a comment

Choose a reason for hiding this comment

Uh oh!

eramongodb left a comment

Choose a reason for hiding this comment

Uh oh!

eramongodb Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vector-of-bool commented Dec 20, 2022 •

edited by kevinAlbs

Loading