Skip to content

Add support for parquet_writer_version session property#10419

Closed
svm1 wants to merge 1 commit intofacebookincubator:mainfrom
svm1:parq_datapage_version
Closed

Add support for parquet_writer_version session property#10419
svm1 wants to merge 1 commit intofacebookincubator:mainfrom
svm1:parq_datapage_version

Conversation

@svm1
Copy link
Copy Markdown
Collaborator

@svm1 svm1 commented Jul 9, 2024

Allow the Presto session property parquet_writer_version, which is currently ignored by Velox, to toggle the parquet writer datapage version (V1 or V2). The value can be set as a session property or can be provided in the Hive config. Defaults to V2.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2024
@netlify
Copy link
Copy Markdown

netlify bot commented Jul 9, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 4f9f645
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/669827e9e1cd800008f34a4c

@svm1 svm1 changed the title [WIP] Add support for parquet_writer_version session property Add support for parquet_writer_version session property Jul 17, 2024
@svm1 svm1 force-pushed the parq_datapage_version branch 2 times, most recently from c8edd7f to 46f4792 Compare July 17, 2024 20:16
@svm1 svm1 force-pushed the parq_datapage_version branch from d5fe5f8 to 4f9f645 Compare July 17, 2024 20:22
@svm1 svm1 marked this pull request as ready for review July 17, 2024 20:22
if (parquetDataPageVersion == "PARQUET_1_0") {
options->parquetDataPageVersion =
dwio::common::ParquetDataPageVersion::V1;
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if it's "PARQUET_2_0", otherwise throw an error

Copy link
Copy Markdown
Collaborator Author

@svm1 svm1 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that parquetDataPageVersion value must be either "PARQUET_1_0" or "PARQUET_2_0" if we've reached this point, as I added validation in the HiveConfig::parquetDataPageVersion() function from which this value is fetched:

VELOX_CHECK(
      parquetDataPageVersion == "PARQUET_1_0" ||
          parquetDataPageVersion == "PARQUET_2_0",
      "Invalid Parquet version.");

const date::time_zone* sessionTimezone_{nullptr};
};

enum class ParquetDataPageVersion { V1, V2 };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parquet specific options are better to be in Parquet writer options. It'll be better to rebase this PR after #10470 is merged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 I rebased and opened a new updated PR for this here - #10573. Would appreciate if you could take a look!

@svm1 svm1 closed this Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants