Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 19, 2021

What changes were proposed in this pull request?

Modify RandomDataGenerator.forType() to allow generation of dates/timestamps that are valid in both Julian and Proleptic Gregorian calendars. Currently, the function can produce a date (for example 1582-10-06) which is valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC files AS IS since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark shifts 1582-10-06 to the next valid date 1582-10-15 while saving it to ORC files. And as a consequence of that, the test fails because it compares original date 1582-10-06 and the date 1582-10-15 loaded back from the ORC files.

In this PR, I propose to generate valid dates/timestamps in both calendars for ORC datasource till SPARK-34440 is resolved.

Why are the changes needed?

The changes fix failures of HiveOrcHadoopFsRelationSuite. For instance, the test "test all data types" fails with the seed 610710213676:

== Results ==
!== Correct Answer - 20 ==    == Spark Answer - 20 ==
 struct<index:int,col:date>   struct<index:int,col:date>
...
![9,1582-10-06]               [9,1582-10-15]

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the modified test suite:

$ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveOrcHadoopFsRelationSuite"

Authored-by: Max Gekk [email protected]
Signed-off-by: HyukjinKwon [email protected]
(cherry picked from commit 0316105)
Signed-off-by: Max Gekk [email protected]

Modify `RandomDataGenerator.forType()` to allow generation of dates/timestamps that are valid in both Julian and Proleptic Gregorian calendars. Currently, the function can produce a date (for example `1582-10-06`) which is valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC files AS IS since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark shifts `1582-10-06` to the next valid date `1582-10-15` while saving it to ORC files. And as a consequence of that, the test fails because it compares original date `1582-10-06` and the date `1582-10-15` loaded back from the ORC files.

In this PR, I propose to generate valid dates/timestamps in both calendars for ORC datasource till SPARK-34440 is resolved.

The changes fix failures of `HiveOrcHadoopFsRelationSuite`. For instance, the test "test all data types" fails with the seed **610710213676**:
```
== Results ==
!== Correct Answer - 20 ==    == Spark Answer - 20 ==
 struct<index:int,col:date>   struct<index:int,col:date>
...
![9,1582-10-06]               [9,1582-10-15]
```

No

By running the modified test suite:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveOrcHadoopFsRelationSuite"
```

Closes apache#31552 from MaxGekk/fix-HiveOrcHadoopFsRelationSuite.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 0316105)
Signed-off-by: Max Gekk <[email protected]>
@github-actions github-actions bot added the SQL label Feb 19, 2021
@MaxGekk MaxGekk changed the title [SPARK-34424][SQL][TESTS][3.1] Fix failures of HiveOrcHadoopFsRelationSuite [SPARK-34424][SQL][TESTS][3.1][3.0] Fix failures of HiveOrcHadoopFsRelationSuite Feb 19, 2021
@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 19, 2021

@HyukjinKwon I cherry-picked this onto 3.0, and ran the test locally.

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39844/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39844/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Test build #135264 has finished for PR 31589 at commit 0031ab3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to 3.1/3.0!

cloud-fan pushed a commit that referenced this pull request Feb 19, 2021
…lationSuite

### What changes were proposed in this pull request?
Modify `RandomDataGenerator.forType()` to allow generation of dates/timestamps that are valid in both Julian and Proleptic Gregorian calendars. Currently, the function can produce a date (for example `1582-10-06`) which is valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC files AS IS since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark shifts `1582-10-06` to the next valid date `1582-10-15` while saving it to ORC files. And as a consequence of that, the test fails because it compares original date `1582-10-06` and the date `1582-10-15` loaded back from the ORC files.

In this PR, I propose to generate valid dates/timestamps in both calendars for ORC datasource till SPARK-34440 is resolved.

### Why are the changes needed?
The changes fix failures of `HiveOrcHadoopFsRelationSuite`. For instance, the test "test all data types" fails with the seed **610710213676**:
```
== Results ==
!== Correct Answer - 20 ==    == Spark Answer - 20 ==
 struct<index:int,col:date>   struct<index:int,col:date>
...
![9,1582-10-06]               [9,1582-10-15]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveOrcHadoopFsRelationSuite"
```

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: HyukjinKwon <gurwls223apache.org>
(cherry picked from commit 0316105)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #31589 from MaxGekk/fix-HiveOrcHadoopFsRelationSuite-3.1.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Feb 19, 2021
…lationSuite

### What changes were proposed in this pull request?
Modify `RandomDataGenerator.forType()` to allow generation of dates/timestamps that are valid in both Julian and Proleptic Gregorian calendars. Currently, the function can produce a date (for example `1582-10-06`) which is valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC files AS IS since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark shifts `1582-10-06` to the next valid date `1582-10-15` while saving it to ORC files. And as a consequence of that, the test fails because it compares original date `1582-10-06` and the date `1582-10-15` loaded back from the ORC files.

In this PR, I propose to generate valid dates/timestamps in both calendars for ORC datasource till SPARK-34440 is resolved.

### Why are the changes needed?
The changes fix failures of `HiveOrcHadoopFsRelationSuite`. For instance, the test "test all data types" fails with the seed **610710213676**:
```
== Results ==
!== Correct Answer - 20 ==    == Spark Answer - 20 ==
 struct<index:int,col:date>   struct<index:int,col:date>
...
![9,1582-10-06]               [9,1582-10-15]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveOrcHadoopFsRelationSuite"
```

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: HyukjinKwon <gurwls223apache.org>
(cherry picked from commit 0316105)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #31589 from MaxGekk/fix-HiveOrcHadoopFsRelationSuite-3.1.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 41b22c4)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants