[SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence by srowen · Pull Request #25279 · apache/spark

srowen · 2019-07-28T13:49:09Z

What changes were proposed in this pull request?

See discussion on the JIRA (and dev@). At heart, we find that math.log and math.pow can actually return slightly different results across platforms because of hardware optimizations. For the actual SQL log and pow functions, I propose that we should use StrictMath instead to ensure the answers are already the same. (This should have the benefit of helping tests pass on aarch64.)

Further, the atanh function (which is not part of java.lang.Math) can be implemented in a slightly different and more accurate way.

How was this patch tested?

Existing tests (which will need to be changed).
Some manual testing locally to understand the numeric issues.

…htly more stable atanh formula

srowen · 2019-07-28T13:50:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

  """,
-  arguments = """
-    Arguments:
-      * expr - hyperbolic angle


This description is wrong; the argument is not an angle. For consistency with other inverse functions, I just removed the description.

Oops. Thanks for the fix.

sql/core/src/test/resources/sql-tests/results/pgSQL/float8.sql.out

SparkQA · 2019-07-28T15:26:16Z

Test build #108274 has finished for PR 25279 at commit 37301dd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class Log(child: Expression) extends UnaryLogExpression(StrictMath.log, \"LOG\")
case class Log10(child: Expression) extends UnaryLogExpression(StrictMath.log10, \"LOG10\")
case class Log1p(child: Expression) extends UnaryLogExpression(StrictMath.log1p, \"LOG1P\")

SparkQA · 2019-07-28T23:03:10Z

Test build #108283 has finished for PR 25279 at commit ddb2f0f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-07-29T03:42:22Z

I quickly checked performance numbers on my local;

// on Mac (2 GHz Intel Core i5)
// java version "1.8.0_181"
// Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
// Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
scala> spark.range(30000000L).selectExpr("acosh(id)").write.format("noop").save()

w/Math
Elapsed time: 0.5992446991111111s

w/StrictMath
Elapsed time: 1.1835621365555555s

As you said in the mail, there are some overheads on x86/64.
Could you check them on power, @kiszk ?

kiszk · 2019-07-29T03:46:52Z

@maropu Thank you, I found an entry for StrictMath optimization in OpenJDK.
Could you please add the version of OpenJDK that you used, too?

maropu · 2019-07-29T03:59:20Z

Updated above.

kiszk · 2019-07-29T05:46:54Z

Do we need to update sql-migration-guide.md?

srowen · 2019-07-29T14:02:55Z

See https://issues.apache.org/jira/browse/SPARK-28519?focusedCommentId=16895279&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16895279 for an interesting bit of analysis. The StrictMath answer does appear correct and the difference comes from some alternate assembly code implementation in the x86 JVM. Whatever happens there we may wish to sidestep the problem with a change like this anyway.

Yes I can update the migration guide with a note that the returned value may be very slightly different.

The perf overhead is trivial here, so that's no issue.

Now let me investigate the further test failures.

SparkQA · 2019-07-29T19:41:40Z

Test build #108337 has finished for PR 25279 at commit 3550d06.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

SparkQA · 2019-07-30T16:15:15Z

Test build #108402 has finished for PR 25279 at commit 3971b21.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class Exp(child: Expression) extends UnaryMathExpression(StrictMath.exp, \"EXP\")
case class Expm1(child: Expression) extends UnaryMathExpression(StrictMath.expm1, \"EXPM1\")

SparkQA · 2019-07-31T04:22:26Z

Test build #108430 has finished for PR 25279 at commit c5ca3ea.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class Exp(child: Expression) extends UnaryMathExpression(StrictMath.exp, \"EXP\")
case class Expm1(child: Expression) extends UnaryMathExpression(StrictMath.expm1, \"EXPM1\")

SparkQA · 2019-07-31T17:50:38Z

Test build #108466 has finished for PR 25279 at commit 7a0eed6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huangtianhua · 2019-08-01T01:26:32Z

@srowen Thanks a lot. And a good news:) all tests passed on x86_64 and aarch64 based on your pr(only two tests now we increase the timeout, later we will tests on larger server), see: https://logs.openlabtesting.org/logs/6/6/947ddad683ad7a2e0a0cc4c2310e352ace21a86f/check/spark-build-arm64/8e39061/

srowen

OK I feel pretty good about the change and the reasoning. I'll leave it open a little longer for comments.

srowen · 2019-08-02T15:55:50Z

Merged to master

gatorsmile · 2019-08-07T02:33:46Z

@srowen Could you show the perf benchmark? The performance regression is expected, right?

srowen · 2019-08-07T02:45:34Z

I did not benchmark this as I think it's a correctness issue that would be worth a perf hit. I also expect it makes almost no difference - computing a function in SQL is dominated by so much more than the math here. Let me assess that though with some microbenchmarks.

srowen · 2019-08-07T17:31:34Z

@gatorsmile here's a quick benchmark for log():

Running benchmark: log
  Running case: log
  Stopped after 20 iterations, 6376 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14.6
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
log:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
log                                                 297            319          17          6.7         148.3       1.0X

Current:

Running benchmark: log
  Running case: log
  Stopped after 20 iterations, 6507 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14.6
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
log:                                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
log                                                 301            325          15          6.7         150.3       1.0X

Looks like it's slower by about 2 nanoseconds, but it's not even statistically significant.
Even StrictMath is native code. I think we could say there is no perf hit.

gatorsmile · 2019-08-08T22:56:20Z

Thanks! @srowen

cc @rednaxelafx

rednaxelafx · 2019-08-08T23:35:53Z

Thanks @srowen ! Basically we just want to make sure if we anticipate there are performance implications, some due diligence of benchmarking should be done. Thanks for adding that information in this PR for future reference!

BTW, folks that are not intimately familiar with the inner workings of the HotSpot JVM may find this interesting:
While both java.lang.Math and java.lang.StrictMath are partially implemented with native functions, they're really different under the hood.

In current OpenJDK8u,

java.lang.StrictMath:
- Java side: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/89c8bfe10659/src/share/classes/java/lang/StrictMath.java
- native side: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/89c8bfe10659/src/share/native/java/lang/StrictMath.c
java.lang.Math:
- Java side: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/89c8bfe10659/src/share/classes/java/lang/Math.java

Now, taking one specific example of Math.log():

    public static double log(double a) {
        return StrictMath.log(a); // default impl. delegates to StrictMath
    }

goes to StrictMath.log():

    public static native double log(double a);

goes to native code:

JNIEXPORT jdouble JNICALL
Java_java_lang_StrictMath_log(JNIEnv *env, jclass unused, jdouble d)
{
    return (jdouble) jlog((double)d);
}

and then jlog is actually the log function from fdlibm via:

src/share/native/java/lang/fdlibm/include/jfdlibm.h:44:#define log     jlog

So then Math.log and StrictMath.log should be the exact same thing? Well not so fast.
In the HotSpot JVM, it treats some Java methods as being "special", aka "intrinsics", which means it knows exactly what semantics that method needs to implement, and choose an alternative internal implementation for it.
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c7a3e57fdf4a/src/share/vm/classfile/vmSymbols.hpp#l681

  do_intrinsic(_dlog,                     java_lang_Math,         log_name,   double_double_signature,           F_S)   \

Math.log() happens to be one of the intrinsified Java methods. It is implemented with special assembler snippets in all of:

In all cases in (1) to (3) above, the Math.log() method call is actually implemented by this x86 instruction sequence on x86_64 in HotSpot:

fldln2   # load constant loge 2
fxch     # exchange floating point registers
fyl2x    # compute y * log_2(x)

So while StrictMath.log() is always implemented by fdlibm in OpenJDK8u, Math.log() can be intrinsified into different implementations inside the OpenJDK HotSpot JVM that use platform-specific assembler, which can result in different results cross platform.

srowen · 2019-08-08T23:48:01Z

Yeah this is great info. I understand that java.lang.Math is allowed to return slightly different results from what some IEEE standard dictates but there is a most-correct answer in double precision, and these assembly code implementations in the x86 JVM seem to be clearly less correct for non-corner cases like log(3). That surprises me. There's a thread forwarded to dev@ with some additional info.

Anyway, I think it's a good move to make the actual log() function in SQL return values that are consistent cross-platform and consistent with StrictMath, even if that doesn't mean we need to use StrictMath for every log call internally (for many usages in MLlib it won't matter at all). And because there is basically no meaningful performance change.

rednaxelafx · 2019-08-19T21:21:36Z

I'd like to ask a post-hoc question on this PR: this one focuses on log() / exp() / pow() family of math functions. What about sqrt()? You can see that after this PR, there are still code like:

case class Acosh(child: Expression)
  extends UnaryMathExpression((x: Double) => StrictMath.log(x + math.sqrt(x * x - 1.0)), "ACOSH") {

left in mathExpressions.scala, which feels pretty inconsistent.

Are we planning to route sqrt() to StrictMath as well?

srowen · 2019-08-19T21:29:14Z

That could well be reasonable. I didn't do so simply because I wasn't aware of cases where the Oracle JDK returns different answers for sqrt(). At least, when running tests on ARM it evidently doesn't produce different answers for any of the sqrt-related tests, but that doesn't mean it doesn't exist.

Use StrictMath log, pow functions for platform independence. Use slig…

37301dd

…htly more stable atanh formula

srowen self-assigned this Jul 28, 2019

srowen commented Jul 28, 2019

View reviewed changes

dongjoon-hyun added the SQL label Jul 28, 2019

Fix tests

ddb2f0f

More test fixes

3550d06

huangtianhua reviewed Jul 30, 2019

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala Show resolved Hide resolved

Addition changes for exp, expm1

3971b21

Fix codegen for exp, expm1

c5ca3ea

Another fix to test value

7a0eed6

srowen changed the title ~~[WIP][SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence~~ [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence Aug 1, 2019

srowen commented Aug 1, 2019

View reviewed changes

srowen closed this in b148bd5 Aug 2, 2019

srowen deleted the SPARK-28519 branch August 9, 2019 16:06

zhengruifeng mentioned this pull request Nov 29, 2019

[SPARK-30057][DOCS]Add a statement of platforms Spark runs on #26690

Closed

Comments

Conversation

srowen commented Jul 28, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Jul 28, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Jul 28, 2019

Uh oh!

SparkQA commented Jul 28, 2019

Uh oh!

maropu commented Jul 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiszk commented Jul 29, 2019

Uh oh!

maropu commented Jul 29, 2019

Uh oh!

kiszk commented Jul 29, 2019

Uh oh!

srowen commented Jul 29, 2019

Uh oh!

SparkQA commented Jul 29, 2019

Uh oh!

Uh oh!

SparkQA commented Jul 30, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

huangtianhua commented Aug 1, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Aug 2, 2019

Uh oh!

gatorsmile commented Aug 7, 2019

Uh oh!

srowen commented Aug 7, 2019

Uh oh!

srowen commented Aug 7, 2019

Uh oh!

gatorsmile commented Aug 8, 2019

Uh oh!

rednaxelafx commented Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rednaxelafx commented Aug 19, 2019

Uh oh!

srowen commented Aug 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

maropu commented Jul 29, 2019 •

edited

Loading

rednaxelafx commented Aug 8, 2019 •

edited

Loading

srowen commented Aug 8, 2019 •

edited

Loading