Handle Decimal-128 Multiplication For Newer Spark Versions#1623
Handle Decimal-128 Multiplication For Newer Spark Versions#1623razajafri merged 10 commits intoNVIDIA:branch-24.02from
Conversation
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
|
related to NVIDIA/spark-rapids#9859 |
|
@tgravescs I think I have answered your previous comment. Do you have any other comments? |
|
build |
hyperbolic2346
left a comment
There was a problem hiding this comment.
Some minor nits, looks good overall and I like the comments added.
src/main/cpp/src/decimal_utils.cu
Outdated
| std::unique_ptr<cudf::table> multiply_decimal128(cudf::column_view const& a, | ||
| cudf::column_view const& b, | ||
| int32_t product_scale, | ||
| bool const& cast_interim_result, |
There was a problem hiding this comment.
| bool const& cast_interim_result, | |
| bool const cast_interim_result, |
No real advantage to passing it as a pointer if it is const and smaller than a pointer.
|
build |
|
build |
|
CI failing because of but passes locally even after doing a clean build |
|
build |
| public static Table multiply128(ColumnView a, ColumnView b, int productScale) { | ||
| return new Table(multiply128(a.getNativeView(), b.getNativeView(), productScale)); | ||
| public static Table multiply128(ColumnView a, ColumnView b, int productScale, boolean interimCast) { | ||
| return new Table(multiply128(a.getNativeView(), b.getNativeView(), productScale, interimCast)); |
There was a problem hiding this comment.
If the interimCast is applied to Spark versions 3.2.4, 3.3.3, 3.4.1, 3.5.0 and 4.0.0 then please add such clarification into the docs of this function.
There was a problem hiding this comment.
I thought the docs were pretty clear, if there is anything else you want me to add please share and I will add.
src/test/java/com/nvidia/spark/rapids/jni/DecimalUtilsTest.java
Outdated
Show resolved
Hide resolved
|
build |
| * WARNING: With interimCast set to true, this method has a bug which we match with Spark versions before 3.4.2, | ||
| * 4.0.0, 3.5.1. Consider the following example using Decimal with a precision of 38 and scale of 10: | ||
| * -8533444864753048107770677711.1312637916 * -12.0000000000 = 102401338377036577293248132533.575166 | ||
| * while the actual answer based on Java BigDecimal is 102401338377036577293248132533.575165 | ||
| * | ||
| * @param a factor input, must match row count of the other factor input | ||
| * @param b factor input, must match row count of the other factor input | ||
| * @param productScale scale to use for the product type | ||
| * @param interimCast whether to cast the result of the division to 38 precision before casting it again to the final | ||
| * precision |
There was a problem hiding this comment.
Nit: Sorry, I think these lines are longer than usual thus we may need to rewrite them a little bit. As a convention, typically a line should not exceed 100 characters. The lines above are up to 120.
There was a problem hiding this comment.
Yeah, good point. I see the file has other instances where we are doing this. To keep this PR focused, I don't want to make the changes in other places and since the CI has passed I would really appreciate if we can do that as a follow-on.
|
Thanks @ttnghia I will file a follow-on for the line length! appreciate your help |
This PR adds a new method for multiplying two Decimal 128-bit numbers without casting the interim result before casting to the scale the final answer is expected to be in.
When multiplying two decimal numbers with a precision of 38, earlier versions of Spark are capped by the precision of 38 which causes the answer to be casted twice essentially, once right after multiplication and once before reporting the final answer which leads to the answer being inaccurate.
For more details please look at the Spark issue which explains the issue in greater detail.
Changes Made:
multiply128method that takes in interimCast boolean.Tests: