[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643

MaxGekk · 2020-05-26T11:00:44Z

What changes were proposed in this pull request?

Currently, the legacy fractional formatter is based on the implementation from Spark 2.4 which formats the input timestamp twice:

    val timestampString = ts.toString
    val formatted = legacyFormatter.format(ts)

to strip trailing zeros. This PR proposes to avoid the first formatting by forming the second fraction directly.

Why are the changes needed?

It makes legacy fractional formatter faster.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test "format fraction of second" in TimestampFormatterSuite + added test for timestamps before 1970-01-01 00:00:00Z

SparkQA · 2020-05-26T11:47:16Z

Test build #123120 has finished for PR 28643 at commit 046ea8a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-26T23:23:02Z

Test build #123135 has finished for PR 28643 at commit 3e0b9a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2020-05-27T13:22:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala

+    if (nanos == 0) {
      formatted
+    } else {
+      // Formats non-zero seconds fraction w/o trailing zeros. For example:


I borrowed the implementation from JDK 11 java.sql.Timestamp.toString:
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.sql/share/classes/java/sql/Timestamp.java#L266-L313

Does perf matter so much here? It's much simpler to write

while (nanos % 10 == 0) { nanos /= 10 } formatted + "." + nanos.toString

This code is wrong. For example, if nanos = 000001000. You removed trailing zeros in the loop, and get just 1. So, the result will be .1 which is wrong. It should be .000001.

MaxGekk · 2020-05-27T13:25:49Z

@cloud-fan @HyukjinKwon Please, review this PR.

SparkQA · 2020-05-27T18:16:41Z

Test build #123189 has finished for PR 28643 at commit eeff55a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-27T18:41:00Z

Test build #123191 has finished for PR 28643 at commit 292311a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-05-27T18:44:51Z

thanks, merging to master!

It's not a perf regression so I didn't backport it to 3.0

MaxGekk added 2 commits May 26, 2020 13:52

Add tests

6ecc159

Optimize

046ea8a

probot-autolabeler bot added the SQL label May 26, 2020

MaxGekk changed the title ~~[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter~~ [WIP][SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter May 26, 2020

MaxGekk added 3 commits May 26, 2020 19:43

Another implementation

c06d9fb

Optimized

0b28909

Simpler impl

3e0b9a4

MaxGekk changed the title ~~[WIP][SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter~~ [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter May 26, 2020

JDK 11 impl

eeff55a

MaxGekk commented May 27, 2020

View reviewed changes

Adjust comment

292311a

cloud-fan approved these changes May 27, 2020

View reviewed changes

cloud-fan closed this in b5eb093 May 27, 2020

MaxGekk deleted the optimize-legacy-fract-format branch June 5, 2020 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643

[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643

Uh oh!

MaxGekk commented May 26, 2020 •

edited

Loading

Uh oh!

SparkQA commented May 26, 2020

Uh oh!

SparkQA commented May 26, 2020

Uh oh!

MaxGekk May 27, 2020

Uh oh!

cloud-fan May 27, 2020

Uh oh!

MaxGekk May 27, 2020

Uh oh!

MaxGekk commented May 27, 2020

Uh oh!

SparkQA commented May 27, 2020

Uh oh!

SparkQA commented May 27, 2020

Uh oh!

cloud-fan commented May 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643

[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643

Uh oh!

Conversation

MaxGekk commented May 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented May 26, 2020

Uh oh!

SparkQA commented May 26, 2020

Uh oh!

MaxGekk May 27, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 27, 2020

Choose a reason for hiding this comment

Uh oh!

MaxGekk May 27, 2020

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented May 27, 2020

Uh oh!

SparkQA commented May 27, 2020

Uh oh!

SparkQA commented May 27, 2020

Uh oh!

cloud-fan commented May 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented May 26, 2020 •

edited

Loading