Databricks Delta defaults to LEGACY for int96RebaseModeInWrite #8166

tgravescs · 2023-04-21T13:25:25Z

Describe the bug
It looks like Databricks defaults all of the rebase mode stuff to LEGACY mode. It appears that perhaps that Delta writes didn't used to write this metadata to the parquet file, but recently changed it to honor it. This means that any INT96 timestamp fields we can't read with the GPU parquet reader. This particular job had spark.sql.parquet.int96RebaseModeInRead=CORRECTED, but since the file itself has the INT96 tag (key = org.apache.spark.legacyINT96) that takes precedence and we end up failing to read

tgravescs · 2023-04-21T13:26:01Z

Here I think we may be able to actually check the data to see if it needs to be rebased and only throw if it actually does. Investigating this approach.

tgravescs · 2023-04-24T18:22:18Z

so while Databricks does default to this, the customer issue reported was actually just that they had some timestamps that were to old. We already have some base logic to see if a date it older then the max timestamp of any timezone and only throw if it is older.

closing this

tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 21, 2023

tgravescs self-assigned this Apr 21, 2023

tgravescs removed the ? - Needs Triage Need team to review and classify label Apr 24, 2023

tgravescs closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks Delta defaults to LEGACY for int96RebaseModeInWrite #8166

Databricks Delta defaults to LEGACY for int96RebaseModeInWrite #8166

tgravescs commented Apr 21, 2023

tgravescs commented Apr 21, 2023

tgravescs commented Apr 24, 2023

Databricks Delta defaults to LEGACY for int96RebaseModeInWrite #8166

Databricks Delta defaults to LEGACY for int96RebaseModeInWrite #8166

Comments

tgravescs commented Apr 21, 2023

tgravescs commented Apr 21, 2023

tgravescs commented Apr 24, 2023