Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 12, 2020

What changes were proposed in this pull request?

Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of rebaseGregorianToJulianMicros which performs rebasing via local timestamps. In the case of overlapping:

  1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp.
  2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise
  3. Set time zone offsets to Julian timestamp from the next day.

Note: The fix assumes that transitions cannot happen more often than once per 2 days.

Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all.

Why are the changes needed?

  1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of Asia/Hong_Kong, the time zone switched from Japan Standard Time (UTC+9) to Hong Kong Time (UTC+8) on Sunday, 18 November, 1945 01:59:59 AM. The changes allow to handle the special case as well.
  2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine research-jenkins-worker-09.

Does this PR introduce any user-facing change?

It might affect micros rebasing in before common era when not-optimised version of rebaseGregorianToJulianMicros() is used directly.

How was this patch tested?

  1. By existing tests in DateTimeUtilsSuite, RebaseDateTimeSuite, DateFunctionsSuite, DateExpressionsSuite and TimestampFormatterSuite.
  2. Added new test to RebaseDateTimeSuite
  3. Regenerated gregorian-julian-rebase-micros.json with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.

Authored-by: Max Gekk [email protected]
Signed-off-by: Wenchen Fan [email protected]
(cherry picked from commit c259844)
Signed-off-by: Dongjoon Hyun [email protected]
(cherry picked from commit eae1747)
Signed-off-by: Max Gekk [email protected]

…while switching standard time zone offset

Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping:
1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp.
2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise
3. Set time zone offsets to Julian timestamp from the next day.

Note: The fix assumes that transitions cannot happen more often than once per 2 days.

Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well.

It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly.

1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`.
2. Added new test to `RebaseDateTimeSuite`
3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.

Closes apache#28787 from MaxGekk/HongKong-tz-1945.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit c259844)
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 12, 2020

@cloud-fan Please, take a look at the backport of #28787 to branch-3.0

@cloud-fan
Copy link
Contributor

do you know what caused the conflicts?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 12, 2020

do you know what caused the conflicts?

As usual - fromMillis()

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123906 has finished for PR 28809 at commit 35245c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 12, 2020

Jenkins, retest this, please

@gatorsmile
Copy link
Member

We need to split such a long Jenkin job to multiple shorter Jenkins Jobs

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123925 has finished for PR 28809 at commit 35245c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some investigations, I found that the failures seem to reported consistently on research-jenkins-worker-09. It might be Amplap Jenkins host issue (Java version or environment). We had better fix the host or make this PR more robust on those problems before merging this PR to branch-3.0.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 15, 2020

It might be Amplap Jenkins host issue (Java version or environment).

It uses JDK w/ outdated time zone database (not clear from log which version):

JAVA_HOME=/usr/lib/jvm/java-8-oracle

other jenkins machines have:

JAVA_HOME=/usr/java/jdk1.8.0_191

If we are not able to upgrade JDK 1.8 to the recent version, can we have at least the same JDK on all jenkins machines?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 15, 2020

I am going to skip the test checks if JDK tzdb is outdated and Asia/Hong_Kong doesn't have timestamps overlapping in 1945 at all.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 15, 2020

I will cherry-pick #28832 here when it will be merged to master.

@dongjoon-hyun
Copy link
Member

It's merged now. Thanks!

…-> HKT at Asia/Hong_Kong in 1945" to outdated tzdb

Old JDK can have outdated time zone database in which `Asia/Hong_Kong` doesn't have timestamp overlapping in 1946 at all. This PR changes the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" in `RebaseDateTimeSuite`, and makes it tolerant to the case.

To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`.

No

By running the test on old JDK

Closes apache#28832 from MaxGekk/HongKong-tz-1945-followup.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit eae1747)
Signed-off-by: Max Gekk <[email protected]>
@SparkQA
Copy link

SparkQA commented Jun 15, 2020

Test build #124060 has finished for PR 28809 at commit 63898cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to 3.0!

cloud-fan pushed a commit that referenced this pull request Jun 16, 2020
…itching standard time zone offset

### What changes were proposed in this pull request?
Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping:
1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp.
2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise
3. Set time zone offsets to Julian timestamp from the next day.

Note: The fix assumes that transitions cannot happen more often than once per 2 days.

Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all.

### Why are the changes needed?
1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well.
2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`.

### Does this PR introduce _any_ user-facing change?
It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly.

### How was this patch tested?
1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`.
2. Added new test to `RebaseDateTimeSuite`
3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>
(cherry picked from commit c259844)
Signed-off-by: Dongjoon Hyun <dongjoonapache.org>
(cherry picked from commit eae1747)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #28809 from MaxGekk/HongKong-tz-1945-3.0.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Jun 16, 2020
holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
…itching standard time zone offset

### What changes were proposed in this pull request?
Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping:
1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp.
2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise
3. Set time zone offsets to Julian timestamp from the next day.

Note: The fix assumes that transitions cannot happen more often than once per 2 days.

Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all.

### Why are the changes needed?
1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well.
2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`.

### Does this PR introduce _any_ user-facing change?
It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly.

### How was this patch tested?
1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`.
2. Added new test to `RebaseDateTimeSuite`
3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>
(cherry picked from commit c259844)
Signed-off-by: Dongjoon Hyun <dongjoonapache.org>
(cherry picked from commit eae1747)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes apache#28809 from MaxGekk/HongKong-tz-1945-3.0.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@MaxGekk MaxGekk deleted the HongKong-tz-1945-3.0 branch December 11, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants