Skip to content

[WIP] Add support for parquet_writer_version session property#9700

Closed
svm1 wants to merge 1 commit intofacebookincubator:mainfrom
svm1:svm_parq_version
Closed

[WIP] Add support for parquet_writer_version session property#9700
svm1 wants to merge 1 commit intofacebookincubator:mainfrom
svm1:svm_parq_version

Conversation

@svm1
Copy link
Copy Markdown
Collaborator

@svm1 svm1 commented May 3, 2024

Resolves prestodb/presto#22595.

Implemented setting of version via session property:
set session hive.parquet_writer_version='PARQUET_1_0';

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 3, 2024
@netlify
Copy link
Copy Markdown

netlify bot commented May 3, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a9b4895
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/663adb69ba69f200080ed98b

@svm1
Copy link
Copy Markdown
Collaborator Author

svm1 commented May 3, 2024

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions.
https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.1/org/apache/parquet/column/ParquetProperties.WriterVersion.html

Copy link
Copy Markdown
Collaborator

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. A couple of comments. Please try to remember to run make format-fix before pushing changes.

@svm1 svm1 changed the title Add support for parquet_writer_version session property [WIP] Add support for parquet_writer_version session property May 3, 2024
@svm1 svm1 force-pushed the svm_parq_version branch from 7a34f12 to a9b4895 Compare May 8, 2024 01:54
@svm1 svm1 changed the title [WIP] Add support for parquet_writer_version session property Add support for parquet_writer_version session property May 8, 2024
@svm1 svm1 marked this pull request as ready for review May 8, 2024 02:36
@svm1 svm1 marked this pull request as draft May 8, 2024 02:36
@svm1 svm1 changed the title Add support for parquet_writer_version session property [WIP] Add support for parquet_writer_version session property May 8, 2024
@majetideepak
Copy link
Copy Markdown
Collaborator

majetideepak commented May 13, 2024

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions.

There are two parquet versions and they are confusing.

  1. The parquet format version: https://github.com/apache/parquet-format/tags
  2. The parquet datapageversion (V1, V2) https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L562

The C++ parquet datapageversion is what the java parquet parquet_writer_version maps to.

@majetideepak
Copy link
Copy Markdown
Collaborator

This confusion was also recently discussed in the Arrow community
https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native] Native workers not writing Parquet data files for WriterVersion v1 (PARQUET_1_0)

4 participants