Handle Decimal-128 Multiplication For Newer Spark Versions by razajafri · Pull Request #1623 · NVIDIA/spark-rapids-jni

razajafri · 2023-12-07T01:10:09Z

This PR adds a new method for multiplying two Decimal 128-bit numbers without casting the interim result before casting to the scale the final answer is expected to be in.

When multiplying two decimal numbers with a precision of 38, earlier versions of Spark are capped by the precision of 38 which causes the answer to be casted twice essentially, once right after multiplication and once before reporting the final answer which leads to the answer being inaccurate.

For more details please look at the Spark issue which explains the issue in greater detail.

Changes Made:

Added a boolean param to control whether to apply the interim cast.
Added a new multiply128 method that takes in interimCast boolean.
Added a warning to the old method to highlight that it knowingly adds a bug to the multiplication to match Spark versions

Tests:

Added a unit test

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

tgravescs · 2023-12-07T16:13:20Z

related to NVIDIA/spark-rapids#9859

razajafri · 2023-12-11T18:13:04Z

@tgravescs I think I have answered your previous comment. Do you have any other comments?

razajafri · 2023-12-12T17:58:21Z

build

src/main/cpp/src/decimal_utils.cu

src/main/java/com/nvidia/spark/rapids/jni/DecimalUtils.java

hyperbolic2346

Some minor nits, looks good overall and I like the comments added.

src/main/cpp/src/decimal_utils.cu

hyperbolic2346 · 2023-12-12T23:36:12Z

src/main/cpp/src/decimal_utils.cu

 std::unique_ptr<cudf::table> multiply_decimal128(cudf::column_view const& a,
                                                 cudf::column_view const& b,
                                                 int32_t product_scale,
+                                                 bool const& cast_interim_result,


Suggested change

bool const& cast_interim_result,

bool const cast_interim_result,

No real advantage to passing it as a pointer if it is const and smaller than a pointer.

src/test/java/com/nvidia/spark/rapids/jni/DecimalUtilsTest.java

razajafri · 2023-12-13T19:40:54Z

build

razajafri · 2023-12-14T00:52:17Z

build

razajafri · 2023-12-14T17:19:23Z

CI failing because of

undefined reference to `cudf::jni::multiply_decimal128(cudf::column_view const&, cudf::column_view const&, int, bool const&, rmm::cuda_stream_view)'

but passes locally even after doing a clean build

razajafri · 2023-12-14T17:21:37Z

build

src/main/cpp/src/DecimalUtilsJni.cpp

ttnghia · 2023-12-14T19:05:17Z

src/main/java/com/nvidia/spark/rapids/jni/DecimalUtils.java

-  public static Table multiply128(ColumnView a, ColumnView b, int productScale) {
-    return new Table(multiply128(a.getNativeView(), b.getNativeView(), productScale));
+  public static Table multiply128(ColumnView a, ColumnView b, int productScale, boolean interimCast) {
+    return new Table(multiply128(a.getNativeView(), b.getNativeView(), productScale, interimCast));


If the interimCast is applied to Spark versions 3.2.4, 3.3.3, 3.4.1, 3.5.0 and 4.0.0 then please add such clarification into the docs of this function.

I thought the docs were pretty clear, if there is anything else you want me to add please share and I will add.

src/main/cpp/src/decimal_utils.cu

src/test/java/com/nvidia/spark/rapids/jni/DecimalUtilsTest.java

razajafri · 2023-12-15T06:45:43Z

build

ttnghia · 2023-12-15T17:21:07Z

src/main/java/com/nvidia/spark/rapids/jni/DecimalUtils.java

+   * WARNING: With interimCast set to  true, this method has a bug which we match with Spark versions before 3.4.2,
+   * 4.0.0, 3.5.1. Consider the following example using Decimal with a precision of 38 and scale of 10:
+   * -8533444864753048107770677711.1312637916 * -12.0000000000 = 102401338377036577293248132533.575166
+   * while the actual answer based on Java BigDecimal is 102401338377036577293248132533.575165
+   *
   * @param a factor input, must match row count of the other factor input
   * @param b factor input, must match row count of the other factor input
   * @param productScale scale to use for the product type
+   * @param interimCast whether to cast the result of the division to 38 precision before casting it again to the final
+   *                    precision


Nit: Sorry, I think these lines are longer than usual thus we may need to rewrite them a little bit. As a convention, typically a line should not exceed 100 characters. The lines above are up to 120.

Yeah, good point. I see the file has other instances where we are doing this. To keep this PR focused, I don't want to make the changes in other places and since the CI has passed I would really appreciate if we can do that as a follow-on.

razajafri · 2023-12-15T17:47:42Z

Thanks @ttnghia I will file a follow-on for the line length! appreciate your help

razajafri added 2 commits December 6, 2023 17:09

Added another multiplication method for decimal 128

64ba019

Signing off

041088d

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

razajafri mentioned this pull request Dec 7, 2023

Add 3.5.1-SNAPSHOT Shim NVIDIA/spark-rapids#9962

Merged

razajafri mentioned this pull request Dec 12, 2023

[BUG] Decimal Division Mismatch On Some Versions Of Spark NVIDIA/spark-rapids#9998

Open

revans2 reviewed Dec 12, 2023

View reviewed changes

src/main/cpp/src/decimal_utils.cu Outdated Show resolved Hide resolved

src/main/java/com/nvidia/spark/rapids/jni/DecimalUtils.java Outdated Show resolved Hide resolved

razajafri added 2 commits December 12, 2023 13:08

addressed review comments

81f82af

fixed clang

96a9297

razajafri requested a review from revans2 December 12, 2023 21:29

revans2 previously approved these changes Dec 12, 2023

View reviewed changes

hyperbolic2346 requested changes Dec 12, 2023

View reviewed changes

addressed review comments

d6f0cfe

razajafri dismissed revans2’s stale review via d6f0cfe December 13, 2023 19:20

ran pre-commit

88e9b7c

razajafri requested a review from hyperbolic2346 December 13, 2023 22:41

removed pass-by reference

b1cb173

hyperbolic2346 previously approved these changes Dec 14, 2023

View reviewed changes

possible reason for CI failure, as locally it still builds

185a62b

razajafri dismissed hyperbolic2346’s stale review via 185a62b December 14, 2023 17:21