Skip to content

Conversation

@gszadovszky
Copy link
Contributor

No description provided.

Copy link

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we might want to consider the performance implications before proceeding.

@gszadovszky
Copy link
Contributor Author

Based on the benchmark the flush-per-record leads to a significant 36% performance loss. See https://tinyurl.com/y3lhgx35 for details.
I'll try to come up with a more advanced solution to save performance.

@gszadovszky
Copy link
Contributor Author

Added a more advanced solution. It is not nice but I did not have a better idea. It even has a 12% performance gain (however I am not sure why): https://tinyurl.com/y2qxodyv.
Any comments/ideas are welcomed.

(The failure is caused by the new parquet-tools merge feature PARQUET-1381. I has conceptional design problems and as it is not a high priority one I am planning to revert it and reopen the JIRA.)

@gszadovszky
Copy link
Contributor Author

Any thoughts? I would like to push it in soon so I can come up with a new RC.

Copy link

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me, @rdblue can you add feedback?

@rdblue
Copy link
Contributor

rdblue commented Feb 26, 2019

I'll have a look today or tomorrow.

@gszadovszky gszadovszky requested a review from rdblue March 4, 2019 08:28
@gszadovszky
Copy link
Contributor Author

@rdblue, could you please have a look on it so I can start a new RC for 1.11.0 (if you find this one correct)?

@mccheah
Copy link

mccheah commented Mar 11, 2019

Ping @rdblue - would be great to confirm that this approach is how we want to go about fixing this issue.

@rdblue
Copy link
Contributor

rdblue commented Mar 12, 2019

+1

This looks good to me. Thanks for fixing it, @gszadovszky!

@gszadovszky gszadovszky merged commit 892dedb into apache:master Mar 13, 2019
shangxinli pushed a commit to shangxinli/parquet-mr that referenced this pull request Mar 1, 2023
Summary:
Revert "PARQUET-1381: Add merge blocks command to parquet-tools (apache#512)" (apache#621)
This reverts commit 863a081.

The design of this feature has conceptional problems and also works incorrectly. See PARQUET-1381 for more details.

PARQUET-1531: Page row count limit causes empty pages to be written from MessageColumnIO (apache#620)

PARQUET-1544: Possible over-shading of modules (apache#628)

Reviewers: pavi

Reviewed By: pavi

Differential Revision: https://code.uberinternal.com/D2769319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants