Revert optimization to reorder columns in parquet writer by raunaqmorarka · Pull Request #17978 · trinodb/trino

raunaqmorarka · 2023-06-20T18:00:34Z

Description

Some files produced by this optimization were ignored by Apache Spark.
Related to https://trinodb.slack.com/archives/CGB0QHWSW/p1687254718875619
Some versions of Databricks Runtime produce an exception when reading files with re-ordered columns.
Related to https://trinodb.slack.com/archives/CP1MUNEUX/p1685685945379909

Additional context and related issues

Original PR #17404
The files produced after this change were found to be sometimes ignored by Apache Spark
and produce exceptions on DBR 12.2 LTS.
Apache Hive and trino parquet reader did not have problems reading any of the new files.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Delta, Hudi, Iceberg
* Fix parquet writer compatibility with Apache Spark and Databricks Runtime. ({issue}`17978`)

findinpath · 2023-06-20T18:53:14Z

Attaching zipped version of the problematic Parquet file retrieved from the original Slack discussion:

20230620_084514_00081_uva2j-f9aec2f4-825d-4f2a-af24-414e1cfdb404.zip

Some files produced by this optimization were ignored by Apache Spark. Some versions of Databricks Runtime produce an exception when reading files with re-ordered columns.

Reproduces the problem fixed by trinodb#17978 by using CTAS on an existing file which reliably reproduced the problem and then attempting to read the resulting table through Apache Hive and Spark

Reproduces the problem fixed by #17978 by using CTAS on an existing file which reliably reproduced the problem and then attempting to read the resulting table through Apache Hive and Spark

Reproduces the problem fixed by trinodb#17978 by using CTAS on an existing file which reliably reproduced the problem and then attempting to read the resulting table through Apache Hive and Spark

cla-bot bot added the cla-signed label Jun 20, 2023

raunaqmorarka added correctness RELEASE-BLOCKER labels Jun 20, 2023

raunaqmorarka requested review from findepi, findinpath, martint and sopel39 June 20, 2023 18:01

sopel39 approved these changes Jun 20, 2023

View reviewed changes

github-actions bot added the tests:hive label Jun 20, 2023

Revert optimization to reorder columns in parquet writer

45efa29

Some files produced by this optimization were ignored by Apache Spark. Some versions of Databricks Runtime produce an exception when reading files with re-ordered columns.

raunaqmorarka force-pushed the pqw-revert branch from 071481d to 45efa29 Compare June 21, 2023 03:27

raunaqmorarka merged commit 91a41a8 into trinodb:master Jun 21, 2023

raunaqmorarka deleted the pqw-revert branch June 21, 2023 05:49

github-actions bot added this to the 420 milestone Jun 21, 2023

raunaqmorarka mentioned this pull request Jun 21, 2023

Release notes for 420 #17762

Closed

colebow mentioned this pull request Jun 21, 2023

Add Trino 420 release notes #17997

Merged

raunaqmorarka mentioned this pull request Jun 26, 2023

Add product test for reading parquet files with reordered data #18042

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert optimization to reorder columns in parquet writer#17978

Revert optimization to reorder columns in parquet writer#17978
raunaqmorarka merged 1 commit intotrinodb:masterfrom
raunaqmorarka:pqw-revert

raunaqmorarka commented Jun 20, 2023 •

edited

Loading

Uh oh!

findinpath commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

raunaqmorarka commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

findinpath commented Jun 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

raunaqmorarka commented Jun 20, 2023 •

edited

Loading