Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 22, 2020

What changes were proposed in this pull request?

  1. Replace the metadata key org.apache.spark.int96NoRebase by org.apache.spark.legacyINT96.
  2. Change the condition when new key should be saved to parquet metadata: it should be saved when the SQL config spark.sql.legacy.parquet.int96RebaseModeInWrite is set to LEGACY.
  3. Change handling the metadata key in read:
    • If there is no the key in parquet metadata, take the rebase mode from the SQL config: spark.sql.legacy.parquet.int96RebaseModeInRead
    • If parquet files were saved by Spark < 3.1.0, use the LEGACY rebasing mode for INT96 type.
    • For files written by Spark >= 3.1.0, if the org.apache.spark.legacyINT96 presents in metadata, perform rebasing otherwise don't.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Modified test in ParquetIOSuite

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 22, 2020

@HyukjinKwon @cloud-fan @tomvanbussel @ala Please, review this PR.

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34764/

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34764/

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34768/

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Test build #130157 has finished for PR 30132 at commit be62b9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34768/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in a03d77d Oct 22, 2020
@SparkQA
Copy link

SparkQA commented Oct 22, 2020

Test build #130161 has finished for PR 30132 at commit 158ab4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk deleted the int96-flip-metadata-rebase-key branch December 11, 2020 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants