Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 26, 2020

What changes were proposed in this pull request?

Currently, the legacy fractional formatter is based on the implementation from Spark 2.4 which formats the input timestamp twice:

    val timestampString = ts.toString
    val formatted = legacyFormatter.format(ts)

to strip trailing zeros. This PR proposes to avoid the first formatting by forming the second fraction directly.

Why are the changes needed?

It makes legacy fractional formatter faster.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test "format fraction of second" in TimestampFormatterSuite + added test for timestamps before 1970-01-01 00:00:00Z

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123120 has finished for PR 28643 at commit 046ea8a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter [WIP][SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter May 26, 2020
@MaxGekk MaxGekk changed the title [WIP][SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter [SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter May 26, 2020
@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123135 has finished for PR 28643 at commit 3e0b9a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (nanos == 0) {
formatted
} else {
// Formats non-zero seconds fraction w/o trailing zeros. For example:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does perf matter so much here? It's much simpler to write

while (nanos % 10 == 0) {
  nanos /= 10
}
formatted + "." + nanos.toString

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong. For example, if nanos = 000001000. You removed trailing zeros in the loop, and get just 1. So, the result will be .1 which is wrong. It should be .000001.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 27, 2020

@cloud-fan @HyukjinKwon Please, review this PR.

@SparkQA
Copy link

SparkQA commented May 27, 2020

Test build #123189 has finished for PR 28643 at commit eeff55a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 27, 2020

Test build #123191 has finished for PR 28643 at commit 292311a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in b5eb093 May 27, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master!

It's not a perf regression so I didn't backport it to 3.0

@MaxGekk MaxGekk deleted the optimize-legacy-fract-format branch June 5, 2020 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants