Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(5797): change default parquet data_page_row_limit to 20k #5957

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Jun 25, 2024

Which issue does this PR close?

Closes #5797

Rationale for this change

We have found evidence that setting this default lower improves the memory performance, and also makes parquet-rs consistent with the default used elsewhere (e.g. parquet-mr, iceberg).

What changes are included in this PR?

Sets a default of 20_000 rows.

Are there any user-facing changes?

There may be performance changes depending upon the user, but we anticipate it being an improvement and bring us into closer alignment.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 25, 2024
@alamb alamb changed the title chore(5797): change default data_page_row_limit to 20k chore(5797): change default parquet data_page_row_limit to 20k Jun 25, 2024
@alamb alamb marked this pull request as ready for review June 25, 2024 23:27
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- I think this is a good change

I don't think it is technically an API change, but it is probably good to call out in the release notes

@@ -37,6 +37,8 @@ pub const DEFAULT_COMPRESSION: Compression = Compression::UNCOMPRESSED;
pub const DEFAULT_DICTIONARY_ENABLED: bool = true;
/// Default value for [`WriterProperties::dictionary_page_size_limit`]
pub const DEFAULT_DICTIONARY_PAGE_SIZE_LIMIT: usize = DEFAULT_PAGE_SIZE;
/// Default value for [`WriterProperties::data_page_row_count_limit`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit Parquet Page Row Count By Default to reduce writer memory requirements with highly compressable columns
3 participants