-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Use only PositionsAppender buffers in PagePartitioner #15839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,6 +27,15 @@ public interface PositionsAppender | |
| */ | ||
| void appendRle(Block value, int rlePositionCount); | ||
|
|
||
| /** | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need a new method TBH. It seems you could just optimize either existing Also comment here should probably highly discourage usage of single row
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like the cost of allocating single position arrays for every row might be significant. @lukasz-stec Do you know what most of the overhead comes from?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, I could add an
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think that would be an overhead. It won't be for batched append. And for row-by-row appends this |
||
| * Appends single position. The implementation must be conceptually equal to | ||
| * {@code append(IntArrayList.wrap(new int[] {position}), source)} but may be optimized. | ||
| * Caller should avoid using this method if {@link #append(IntArrayList, Block)} can be used | ||
| * as appending positions one by one can be significantly slower and may not support features | ||
| * like pushing RLE through the appender. | ||
| */ | ||
| void append(int position, Block source); | ||
|
|
||
| /** | ||
| * Creates the block from the appender data. | ||
| * After this, appender is reset to the initial state, and it is ready to build a new block. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -90,6 +90,13 @@ else if (rleValue != null) { | |
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void append(int position, Block value) | ||
| { | ||
| switchToFlat(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems like it might not be beneficial (e.g. switch-to-flat for 1000 rows to just append single row). This only makes sense if the you know you will call this method a lot of times from now on.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we could handle rle here at some CPU cost but we don't handle it in the blockBuilder case anyway
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but appenders are not block builders nor they should behave like ones. Appenders are optimized to process big batches of positions (including special block types like RLEs and dictionaries). Here we quickly fallback to block-builder behaviour just because we've seen single row append. |
||
| delegate.append(position, value); | ||
| } | ||
|
|
||
| @Override | ||
| public Block build() | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems we could just modify
partitionPageas:Seems like this change would be much smaller and doesn't rise questions about potential regressions (especially for FTE where there will be a lot of small partitions potentially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the
ifthere raises questions about potential regressions.It can easily produce small pages. For example, if 50% of positions are handled by PostiionsAppenders and the other 50% by BlockBuilders this will cause the average page size to be 50% of the expected page size given enough partitions.
If only 5% of positions go to either one then this will produce a lot of extremely small pages because it will always flush almost empty buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
50% is still big enough too, right?
If you have 5% vs 95%, then you have one small page and one large page. It is fine, see Javadoc for
io.trino.operator.project.MergePages