-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31762][SQL][FOLLOWUP] Avoid double formatting in legacy fractional formatter #28643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123120 has finished for PR 28643 at commit
|
|
Test build #123135 has finished for PR 28643 at commit
|
| if (nanos == 0) { | ||
| formatted | ||
| } else { | ||
| // Formats non-zero seconds fraction w/o trailing zeros. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I borrowed the implementation from JDK 11 java.sql.Timestamp.toString:
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.sql/share/classes/java/sql/Timestamp.java#L266-L313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does perf matter so much here? It's much simpler to write
while (nanos % 10 == 0) {
nanos /= 10
}
formatted + "." + nanos.toString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is wrong. For example, if nanos = 000001000. You removed trailing zeros in the loop, and get just 1. So, the result will be .1 which is wrong. It should be .000001.
|
@cloud-fan @HyukjinKwon Please, review this PR. |
|
Test build #123189 has finished for PR 28643 at commit
|
|
Test build #123191 has finished for PR 28643 at commit
|
|
thanks, merging to master! It's not a perf regression so I didn't backport it to 3.0 |
What changes were proposed in this pull request?
Currently, the legacy fractional formatter is based on the implementation from Spark 2.4 which formats the input timestamp twice:
to strip trailing zeros. This PR proposes to avoid the first formatting by forming the second fraction directly.
Why are the changes needed?
It makes legacy fractional formatter faster.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By existing test "format fraction of second" in
TimestampFormatterSuite+ added test for timestamps before 1970-01-01 00:00:00Z