-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Fix estimated serialized size for BlockEncodingBuffers #14688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7725277 to
1a8424f
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 Would you update PR description to describe the problem and the fix?
hi @mbasmanova I just updated the PR message. Let me know if it explains your questions. Thanks! |
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 I don't understand this change. I'm seeing a new scale factor being introduced, but it is always 1 (1.0f). Would you share an example that illustrates the problem and show how this change fixes it? Would it be possible to code it into a test to avoid this being broken accidentally by future changes.
|
@mbasmanova Hi Masha, the new scale factor I will add some comments and tests to the code. |
|
Actually, the above way was building the tree of DecodedBlockNode and adding the estimatedSerializedSizeInBytes in a bottom-up way, and thus difficult to populate the correct sizes since RLE and Dictionary blocks are not leaf nodes. I'm thinking to do this in top-down manner so that the logical size of RLE and Dictionary blocks can be passed down to the children. I'll see if it can make the code easier. |
|
@mbasmanova I just realized I opened a can of worm. The current getLogicalSizeInBytes() is not 100% correct. Suppose there is an ArrayBlock of RLEBlock of VariableWidthBlock. The top level
|
@mbasmanova Thank you Masha. If we take 4, what would you think would be the new fix for memory usage? |
I don't know off the top of my head. |
|
I tend to choose 1) and 3) since my latest tests show if the max capacity is not under estimated, the CPU performance is not affected and it can generally save 20-30% buffer memory. I'll see how much work in 1) is required to get it right. |
1a8424f to
71ce03c
Compare
|
@mbasmanova Hi Masha, I actually fixed the getLogicalSizeInBytes and added some tests. With these changes I saw 5x CPU gain on the regressed query. I can add some more test in TestBlockEncodingBuffers tomorrow. I also simplified DecodedBlockNode and put most logic in decodeBlock(). Appreciate your review again! |
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix getLogicalSizeInBytes() for Blocks looks good % some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation to have the default implementation? It seems incorrect to report region-size as region-logical-size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova For leaf blocks (ie. non Array/Map/Row/Dict/RLE blocks), the logicalSizeInBytes is the same as sizeInBytes. See the following code:
/**
* Returns the size of the block contents, regardless of internal representation.
* The same logical data values should always have the same size, no matter
* what block type is used or how they are represented within a specific block.
*
* This can differ substantially from {@link #getSizeInBytes} for certain block
* types. For RLE, it will be {@code N} times larger. For dictionary, it will be
* larger based on how many times dictionary entries are reused.
*/
default long getLogicalSizeInBytes()
{
return getSizeInBytes();
}
Similarly, regional logical size for leaf blocks is the same as the regional size. We have default implementation here so that we don't have to implement the same thing for all leaf blocks.
presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider replacing comments with variable names, e.g.
- Block arrayOfLong =
- Block arrayOfRleOfLong =
- Block arrayOfRleOfArrayOfLong =
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I renamed the variables. However it's not as straightforward as the comment:
// Row(Dictionary(LongArrayBlock), Dictionary(Row(LongArrayBlock, LongArrayBlock)))
Block rowOfDictionaryOfLongAndDictionaryOfRowOfLongAndLong = ...
So I kept both the comments and renamed variables.
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow additional error margin for estimatedMaxCapacity
typo in commit message: graceFactorFordMaxCapacity -> graceFactorForMaxCapacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- all caps with underscores
- consider making this configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova I will send a separate PR to make it configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic method that can be used in many places. However, the commit says that the change applies only to one specific use case. I'd expect the caller to apply this new factor when computing estimatedMaxCapacity.
- use Math.toIntExact instead of (int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic method that can be used in many places. However, the commit says that the change applies only to one specific use case. I'd expect the caller to apply this new factor when computing estimatedMaxCapacity.
Moved the application of this new factor to setupDecodedBlockAndMapPositions() where the estimatedMaxCapacity is calculated.
- use Math.toIntExact instead of (int)
It's actually casting double to int. toIntExact(long) only takes long.
To clarify, is the query running 5x faster when before the regression? E.g. before regression CPU time was N, after regression - 10N, now it is 0.2N or 2N? |
Hi Masha, it is 2N. |
|
@yingsu00 How much regression is left after this change? |
@mbasmanova I tested on vll1_verifier1 and there is no regression any more. |
|
@mbasmanova Masha, I still need to touch up the test for BlockEncodingBuffers a bit. I will update the PR tomorrow. |
@yingsu00 Thank you for the heads up. |
I'm confused. fixed and un-fixed are the same: 2.93m vs. 2.95m. What is un-fixed here? Is it the version that used more memory than original repartitioning? E.g. the "fix" refers to fixing memory usage? |
@mbasmanova Hi Masha, yes, the |
71ce03c to
61f21f7
Compare
|
@mbasmanova Hi Masha, I just updated the PR with the following changes:
Thank you very much for reviewing! |
mbasmanova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, refactor to extract a helper method to avoid copy-paste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova did you mean something like this?
setEstimatedNullsBufferMaxCapacity(getEstimatedBufferMaxCapacity(targetBufferSize, Byte.BYTES, POSITION_SIZE));
estimatedValueBufferMaxCapacity = getEstimatedBufferMaxCapacity(targetBufferSize, Byte.BYTES, POSITION_SIZE);
and in AbstractBlockEncodingBuffer:
protected static int getEstimatedBufferMaxCapacity(double targetBufferSize, int unitSize, int positionSize)
{
return (int) (targetBufferSize * unitSize / positionSize * GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 Yes, this might reduce copy-paste and make it easier to read and ensure we don't forget GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Hi Masha, I just updated the PR with a new commit e8511df636 Refactor buffer max capacity calculation. Thank you again, and happy long weekend!
61f21f7 to
e8511df
Compare
getLogicalSizeInBytes was supposed to get the deflated sizes of the blocks if they are DictionaryBlock or RunLengthEncodedBlock. However if the nested blocks are DictionaryBlock or RunLengthEncodedBlock, the size was not correctly calculated. This commit fixed this issue.
When a block passed to OptimizedPartitionedOutputOperator is a RLE or Dictionary block, we used to estimated the serialized size using getLogicalSize() which returns the size of the block after inflation. However the child block of the RLE or Dictionary Block was using plain sizeInBytes without considering it is going to be expanded. This commit fixes this problem by adding a scale factor to estimate how many times the child blocks are going to be expanded.
Block.getSizeInBytes() and Block.getLogicalSizeInBytes() always adds up the sizes of nulls buffer even if the block cannot contain nulls. When estimating the max buffer capacity for BlockEncodingBuffers, we can also leave the space for the nullsBuffer and hashTablesBuffer. This will not waste memory because the buffers are not actually allocated until blocks with nulls or hashtables come in. It will make the buffers sizes proportional to the blocks' logical sizes, and make the code cleaner.
In "Enforce buffer size limits for BlockEncodingBuffer" we introduced estimatedMaxCapacity such that the growth of the buffers beyond that value become slower. However the estimated max capacity is not always 100% accurate, and a underestimated value has negative impact on the CPU performance. This commit gives the estimatedMaxCapacity some head room by introducing a GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY with default value 1.2f.
e8511df to
533fd29
Compare
|
@mbasmanova Hi Masha, the tests now all passed. Thank you for reviewing! |
We used Blocks' sizeInBytes or logicalSizeInBytes to estimate the max capacity of the BlockEncodingBuffers. However, there were some error in calculating the max capacity from the decodedBlock.estimatedSerializedSizeInBytes such that the exclusive portion(exclusive of children BlockEncodingBuffers) of the current BlockEncodingBuffer was mistakenly passed to the children BlockEncodingBuffers as inclusive portion. Also, the max capacity for the nested blocks was incorrectly calculated if they are RLE or Dictionary Blocks . This PR fixes these two problems. With these fixes, the CPU time for the reported regressed query in T67972617 was reduced from 100s to 20s.