-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable retry for Parquet writes [databricks] #8243
Conversation
Signed-off-by: Andy Grove <[email protected]>
da4cfd6
to
ffa2e91
Compare
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file a follow on issue. I don't like the fact that we are creating a ColumnarBatch in GpuParquetFileFormat.transform what would throw assertions if we were not calling the constructor directly. Ideally the assertions would happen in the constructor, not just in the helper methods, and we would have a way to use a table instead of a ColumnarBatch after the transform.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ColumnarOutputWriter.scala
Outdated
Show resolved
Hide resolved
build |
CI failed #8235 |
build |
The latest CI failures do not appear to be related to #8235.
This shows up for a lot of different test failures in ORC, HIVE, and others. |
Yes, I see this locally too. Root cause is |
build |
build |
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ColumnarOutputWriter.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetFileFormat.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ColumnarOutputWriter.scala
Outdated
Show resolved
Hide resolved
The blossom build is failing with:
|
I think that's because of recent link changes in CI. I'll send you a direct message, but here's what I see:
|
build |
Closes #8028
The first commit enables retries for Parquet writes (just changes a boolean parameter). This caused test regressions because we were sometimes casting timestamp columns to a cudF type that is not supported in Spark, and this caused issues when spilling batches. The casts are now performed after batches are spilled.