-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset #28809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…while switching standard time zone offset Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping: 1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp. 2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise 3. Set time zone offsets to Julian timestamp from the next day. Note: The fix assumes that transitions cannot happen more often than once per 2 days. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well. It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly. 1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`. 2. Added new test to `RebaseDateTimeSuite` 3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected. Closes apache#28787 from MaxGekk/HongKong-tz-1945. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit c259844) Signed-off-by: Max Gekk <[email protected]>
|
@cloud-fan Please, take a look at the backport of #28787 to branch-3.0 |
|
do you know what caused the conflicts? |
As usual - |
|
Test build #123906 has finished for PR 28809 at commit
|
|
Jenkins, retest this, please |
|
We need to split such a long Jenkin job to multiple shorter Jenkins Jobs |
|
Test build #123925 has finished for PR 28809 at commit
|
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some investigations, I found that the failures seem to reported consistently on research-jenkins-worker-09. It might be Amplap Jenkins host issue (Java version or environment). We had better fix the host or make this PR more robust on those problems before merging this PR to branch-3.0.
It uses JDK w/ outdated time zone database (not clear from log which version): other jenkins machines have: If we are not able to upgrade JDK 1.8 to the recent version, can we have at least the same JDK on all jenkins machines? |
|
I am going to skip the test checks if JDK tzdb is outdated and Asia/Hong_Kong doesn't have timestamps overlapping in 1945 at all. |
|
I will cherry-pick #28832 here when it will be merged to master. |
|
It's merged now. Thanks! |
…-> HKT at Asia/Hong_Kong in 1945" to outdated tzdb Old JDK can have outdated time zone database in which `Asia/Hong_Kong` doesn't have timestamp overlapping in 1946 at all. This PR changes the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" in `RebaseDateTimeSuite`, and makes it tolerant to the case. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`. No By running the test on old JDK Closes apache#28832 from MaxGekk/HongKong-tz-1945-followup. Authored-by: Max Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit eae1747) Signed-off-by: Max Gekk <[email protected]>
|
Test build #124060 has finished for PR 28809 at commit
|
|
thanks, merging to 3.0! |
…itching standard time zone offset ### What changes were proposed in this pull request? Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping: 1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp. 2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise 3. Set time zone offsets to Julian timestamp from the next day. Note: The fix assumes that transitions cannot happen more often than once per 2 days. Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all. ### Why are the changes needed? 1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well. 2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`. ### Does this PR introduce _any_ user-facing change? It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly. ### How was this patch tested? 1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`. 2. Added new test to `RebaseDateTimeSuite` 3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected. Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit c259844) Signed-off-by: Dongjoon Hyun <dongjoonapache.org> (cherry picked from commit eae1747) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #28809 from MaxGekk/HongKong-tz-1945-3.0. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…itching standard time zone offset ### What changes were proposed in this pull request? Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping: 1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp. 2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise 3. Set time zone offsets to Julian timestamp from the next day. Note: The fix assumes that transitions cannot happen more often than once per 2 days. Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all. ### Why are the changes needed? 1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well. 2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`. ### Does this PR introduce _any_ user-facing change? It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly. ### How was this patch tested? 1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`. 2. Added new test to `RebaseDateTimeSuite` 3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected. Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> (cherry picked from commit c259844) Signed-off-by: Dongjoon Hyun <dongjoonapache.org> (cherry picked from commit eae1747) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes apache#28809 from MaxGekk/HongKong-tz-1945-3.0. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of
rebaseGregorianToJulianMicroswhich performs rebasing via local timestamps. In the case of overlapping:Note: The fix assumes that transitions cannot happen more often than once per 2 days.
Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all.
Why are the changes needed?
Asia/Hong_Kong, the time zone switched fromJapan Standard Time(UTC+9) toHong Kong Time(UTC+8) on Sunday, 18 November, 1945 01:59:59 AM. The changes allow to handle the special case as well.research-jenkins-worker-09.Does this PR introduce any user-facing change?
It might affect micros rebasing in before common era when not-optimised version of
rebaseGregorianToJulianMicros()is used directly.How was this patch tested?
DateTimeUtilsSuite,RebaseDateTimeSuite,DateFunctionsSuite,DateExpressionsSuiteandTimestampFormatterSuite.RebaseDateTimeSuitegregorian-julian-rebase-micros.jsonwith the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.Authored-by: Max Gekk [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit c259844)
Signed-off-by: Dongjoon Hyun [email protected]
(cherry picked from commit eae1747)
Signed-off-by: Max Gekk [email protected]