Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 14, 2021

What changes were proposed in this pull request?

Mention the DS options introduced by #31529 and by #31489 in SparkUpgradeException.

Why are the changes needed?

To improve user experience with Spark SQL. Before the changes, the error message recommends to set SQL configs but the configs cannot help in the some situations (see the PRs for more details).

Does this PR introduce any user-facing change?

Yes. After the changes, the error message is:

org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set the SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. To read the datetime values as it is, set the SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'CORRECTED'.

How was this patch tested?

  1. By checking coding style: ./dev/scalastyle
  2. By running the related test suite:
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ParquetRebaseDatetimeV1Suite"

@github-actions github-actions bot added the SQL label Feb 14, 2021
@MaxGekk MaxGekk changed the title [SPARK-34434][SQL][DOCS] Mention DS rebase options in SparkUpgradeException [SPARK-34434][SQL] Mention DS rebase options in SparkUpgradeException Feb 14, 2021
@SparkQA
Copy link

SparkQA commented Feb 14, 2021

Test build #135146 has finished for PR 31562 at commit 7531929.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case "Parquet" =>
(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_READ.key, ParquetOptions.DATETIME_REBASE_MODE)
case "Avro" =>
(SQLConf.LEGACY_AVRO_REBASE_MODE_IN_READ.key, "datetimeRebaseMode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is inevitable because AvroOptions lives in external module.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @MaxGekk .
Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants