You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
It looks like Databricks defaults all of the rebase mode stuff to LEGACY mode. It appears that perhaps that Delta writes didn't used to write this metadata to the parquet file, but recently changed it to honor it. This means that any INT96 timestamp fields we can't read with the GPU parquet reader. This particular job had spark.sql.parquet.int96RebaseModeInRead=CORRECTED, but since the file itself has the INT96 tag (key = org.apache.spark.legacyINT96) that takes precedence and we end up failing to read
The text was updated successfully, but these errors were encountered:
Here I think we may be able to actually check the data to see if it needs to be rebased and only throw if it actually does. Investigating this approach.
so while Databricks does default to this, the customer issue reported was actually just that they had some timestamps that were to old. We already have some base logic to see if a date it older then the max timestamp of any timezone and only throw if it is older.
Describe the bug
It looks like Databricks defaults all of the rebase mode stuff to LEGACY mode. It appears that perhaps that Delta writes didn't used to write this metadata to the parquet file, but recently changed it to honor it. This means that any INT96 timestamp fields we can't read with the GPU parquet reader. This particular job had spark.sql.parquet.int96RebaseModeInRead=CORRECTED, but since the file itself has the INT96 tag (key = org.apache.spark.legacyINT96) that takes precedence and we end up failing to read
The text was updated successfully, but these errors were encountered: