[SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence#25279
[SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence#25279srowen wants to merge 6 commits intoapache:masterfrom
Conversation
…htly more stable atanh formula
| """, | ||
| arguments = """ | ||
| Arguments: | ||
| * expr - hyperbolic angle |
There was a problem hiding this comment.
This description is wrong; the argument is not an angle. For consistency with other inverse functions, I just removed the description.
There was a problem hiding this comment.
Oops. Thanks for the fix.
|
Test build #108274 has finished for PR 25279 at commit
|
|
Test build #108283 has finished for PR 25279 at commit
|
|
I quickly checked performance numbers on my local; As you said in the mail, there are some overheads on x86/64. |
|
Updated above. |
|
Do we need to update |
|
See https://issues.apache.org/jira/browse/SPARK-28519?focusedCommentId=16895279&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16895279 for an interesting bit of analysis. The StrictMath answer does appear correct and the difference comes from some alternate assembly code implementation in the x86 JVM. Whatever happens there we may wish to sidestep the problem with a change like this anyway. Yes I can update the migration guide with a note that the returned value may be very slightly different. The perf overhead is trivial here, so that's no issue. Now let me investigate the further test failures. |
|
Test build #108337 has finished for PR 25279 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
Show resolved
Hide resolved
|
Test build #108402 has finished for PR 25279 at commit
|
|
Test build #108430 has finished for PR 25279 at commit
|
|
Test build #108466 has finished for PR 25279 at commit
|
|
@srowen Thanks a lot. And a good news:) all tests passed on x86_64 and aarch64 based on your pr(only two tests now we increase the timeout, later we will tests on larger server), see: https://logs.openlabtesting.org/logs/6/6/947ddad683ad7a2e0a0cc4c2310e352ace21a86f/check/spark-build-arm64/8e39061/ |
srowen
left a comment
There was a problem hiding this comment.
OK I feel pretty good about the change and the reasoning. I'll leave it open a little longer for comments.
|
Merged to master |
|
@srowen Could you show the perf benchmark? The performance regression is expected, right? |
|
I did not benchmark this as I think it's a correctness issue that would be worth a perf hit. I also expect it makes almost no difference - computing a function in SQL is dominated by so much more than the math here. Let me assess that though with some microbenchmarks. |
|
@gatorsmile here's a quick benchmark for Previous: Current: Looks like it's slower by about 2 nanoseconds, but it's not even statistically significant. |
|
Thanks! @srowen cc @rednaxelafx |
|
Thanks @srowen ! Basically we just want to make sure if we anticipate there are performance implications, some due diligence of benchmarking should be done. Thanks for adding that information in this PR for future reference! BTW, folks that are not intimately familiar with the inner workings of the HotSpot JVM may find this interesting: In current OpenJDK8u,
Now, taking one specific example of public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}goes to public static native double log(double a);goes to native code: JNIEXPORT jdouble JNICALL
Java_java_lang_StrictMath_log(JNIEnv *env, jclass unused, jdouble d)
{
return (jdouble) jlog((double)d);
}and then src/share/native/java/lang/fdlibm/include/jfdlibm.h:44:#define log jlogSo then do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S) \
In all cases in (1) to (3) above, the So while |
|
Yeah this is great info. I understand that java.lang.Math is allowed to return slightly different results from what some IEEE standard dictates but there is a most-correct answer in double precision, and these assembly code implementations in the x86 JVM seem to be clearly less correct for non-corner cases like log(3). That surprises me. There's a thread forwarded to dev@ with some additional info. Anyway, I think it's a good move to make the actual log() function in SQL return values that are consistent cross-platform and consistent with StrictMath, even if that doesn't mean we need to use StrictMath for every log call internally (for many usages in MLlib it won't matter at all). And because there is basically no meaningful performance change. |
|
I'd like to ask a post-hoc question on this PR: this one focuses on case class Acosh(child: Expression)
extends UnaryMathExpression((x: Double) => StrictMath.log(x + math.sqrt(x * x - 1.0)), "ACOSH") {left in mathExpressions.scala, which feels pretty inconsistent. Are we planning to route |
|
That could well be reasonable. I didn't do so simply because I wasn't aware of cases where the Oracle JDK returns different answers for sqrt(). At least, when running tests on ARM it evidently doesn't produce different answers for any of the sqrt-related tests, but that doesn't mean it doesn't exist. |
What changes were proposed in this pull request?
See discussion on the JIRA (and dev@). At heart, we find that math.log and math.pow can actually return slightly different results across platforms because of hardware optimizations. For the actual SQL log and pow functions, I propose that we should use StrictMath instead to ensure the answers are already the same. (This should have the benefit of helping tests pass on aarch64.)
Further, the atanh function (which is not part of java.lang.Math) can be implemented in a slightly different and more accurate way.
How was this patch tested?
Existing tests (which will need to be changed).
Some manual testing locally to understand the numeric issues.