-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1531: Page row count limit causes empty pages to be written from MessageColumnIO #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rom MessageColumnIO
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/io/MessageColumnIO.java
Outdated
Show resolved
Hide resolved
mccheah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we might want to consider the performance implications before proceeding.
|
Based on the benchmark the flush-per-record leads to a significant 36% performance loss. See https://tinyurl.com/y3lhgx35 for details. |
|
Added a more advanced solution. It is not nice but I did not have a better idea. It even has a 12% performance gain (however I am not sure why): https://tinyurl.com/y2qxodyv. (The failure is caused by the new parquet-tools merge feature PARQUET-1381. I has conceptional design problems and as it is not a high priority one I am planning to revert it and reopen the JIRA.) |
|
Any thoughts? I would like to push it in soon so I can come up with a new RC. |
mccheah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane to me, @rdblue can you add feedback?
|
I'll have a look today or tomorrow. |
|
@rdblue, could you please have a look on it so I can start a new RC for 1.11.0 (if you find this one correct)? |
|
Ping @rdblue - would be great to confirm that this approach is how we want to go about fixing this issue. |
|
+1 This looks good to me. Thanks for fixing it, @gszadovszky! |
Summary: Revert "PARQUET-1381: Add merge blocks command to parquet-tools (apache#512)" (apache#621) This reverts commit 863a081. The design of this feature has conceptional problems and also works incorrectly. See PARQUET-1381 for more details. PARQUET-1531: Page row count limit causes empty pages to be written from MessageColumnIO (apache#620) PARQUET-1544: Possible over-shading of modules (apache#628) Reviewers: pavi Reviewed By: pavi Differential Revision: https://code.uberinternal.com/D2769319
No description provided.