Fix estimated serialized size for BlockEncodingBuffers #14688

yingsu00 · 2020-06-21T08:36:30Z

We used Blocks' sizeInBytes or logicalSizeInBytes to estimate the max capacity of the BlockEncodingBuffers. However, there were some error in calculating the max capacity from the decodedBlock.estimatedSerializedSizeInBytes such that the exclusive portion(exclusive of children BlockEncodingBuffers) of the current BlockEncodingBuffer was mistakenly passed to the children BlockEncodingBuffers as inclusive portion. Also, the max capacity for the nested blocks was incorrectly calculated if they are RLE or Dictionary Blocks . This PR fixes these two problems. With these fixes, the CPU time for the reported regressed query in T67972617 was reduced from 100s to 20s.

== NO RELEASE NOTE ==

mbasmanova

@yingsu00 Would you update PR description to describe the problem and the fix?

yingsu00 · 2020-06-22T19:07:08Z

@yingsu00 Would you update PR description to describe the problem and the fix?

hi @mbasmanova I just updated the PR message. Let me know if it explains your questions. Thanks!

mbasmanova

@yingsu00 I don't understand this change. I'm seeing a new scale factor being introduced, but it is always 1 (1.0f). Would you share an example that illustrates the problem and show how this change fixes it? Would it be possible to code it into a test to avoid this being broken accidentally by future changes.

yingsu00 · 2020-06-24T00:19:49Z

@mbasmanova Hi Masha, the new scale factor childBlockEstimatedSerializedSizeScaleFactor is not always 1. It's passed to child decodeBlock as childBlockEstimatedSerializedSizeScaleFactor * decodedBlock.getPositionCount() / dictionary.getPositionCount() for DictionaryBlock and childBlockEstimatedSerializedSizeScaleFactor * decodedBlock.getPositionCount() for RLE Block. Take RLE Block of VariableWidthBlock for example, top level RLE block has 100 positions and VariableWidthBlock has 1 position of 10 bytes value. logicalSizeInBytes for the RLE Block is 1500 bytes = (10 bytes value + 4 bytes offset + 1 byte null) * 100, while logicalSizeInBytes for the VariableWidthBlock is just 15 bytes. We used the logicalSizeInBytes for estimatedSerializedSize for the blocks, so top level RLE block got 1500 but child got 15. Then this estimatedSerializedSize of the child is used to estimate the children buffer sizes so they only get 15 bytes. To fix this, the scale factor of 100(RLE Block's positionCOunt) is passed to decodeBlock, so that the estimatedSerializedSize for the VariableWidthBlock become 15 * 100 = 1500. Then in appendData, this 1500 will be used as estimation for the sliceBuffer and offsetsBuffer in VariableWidthBlockEncodingBuffer.

I will add some comments and tests to the code.

yingsu00 · 2020-06-24T01:29:03Z

Actually, the above way was building the tree of DecodedBlockNode and adding the estimatedSerializedSizeInBytes in a bottom-up way, and thus difficult to populate the correct sizes since RLE and Dictionary blocks are not leaf nodes. I'm thinking to do this in top-down manner so that the logical size of RLE and Dictionary blocks can be passed down to the children. I'll see if it can make the code easier.

yingsu00 · 2020-06-24T09:30:44Z

@mbasmanova I just realized I opened a can of worm. The current getLogicalSizeInBytes() is not 100% correct. Suppose there is an ArrayBlock of RLEBlock of VariableWidthBlock. The top level ArrayBlock.getLogicalSizeInBytes() just returns getSizeInBytes() and it doesn't consider if the child block is a RLEBlock or not. Thus it could return a much smaller logical size than actual. To fix this, we need to override getLogicalSizeInBytes() in ArrayBlock, MapBlock and RowBlock. We also need to implement something like getLogicalRegionSizeInBytes() for all Blocks. I need to think about if it's worth while to do all these. We have these options:

Fix the getLogicalSizeInBytes() and getLogicalRegionSizeInBytes() in all blocks
Implement a new getEstimatedSizeInBytes() in Block that does the correct size estimation
Live with the faulty logicalSizeInBytes. This could result in some CPU regression for some rare cases like the one mentioned above;
Revert the original fix to introduce estimated max capacity. This could result in 20-30% memory increase.
WHat's your preference on this?

mbasmanova · 2020-06-24T10:45:27Z

Here is how I'm thinking about this. #4 gets us back to stable state quickly. From there we can work on a new fix for memory usage. I'd start with that. Then, I'd consider #3. This requires running perf evaluation on a sample of production workload to see how big the regression is.

yingsu00 · 2020-06-24T20:26:53Z

Here is how I'm thinking about this. #4 gets us back to stable state quickly. From there we can work on a new fix for memory usage. I'd start with that. Then, I'd consider #3. This requires running perf evaluation on a sample of production workload to see how big the regression is.

@mbasmanova Thank you Masha. If we take 4, what would you think would be the new fix for memory usage?

mbasmanova · 2020-06-24T20:28:57Z

what would you think would be the new fix for memory usage?

I don't know off the top of my head.

yingsu00 · 2020-06-25T09:56:11Z

I tend to choose 1) and 3) since my latest tests show if the max capacity is not under estimated, the CPU performance is not affected and it can generally save 20-30% buffer memory. I'll see how much work in 1) is required to get it right.

yingsu00 · 2020-06-30T10:39:29Z

@mbasmanova Hi Masha, I actually fixed the getLogicalSizeInBytes and added some tests. With these changes I saw 5x CPU gain on the regressed query. I can add some more test in TestBlockEncodingBuffers tomorrow. I also simplified DecodedBlockNode and put most logic in decodeBlock(). Appreciate your review again!

mbasmanova

Fix getLogicalSizeInBytes() for Blocks looks good % some comments.

mbasmanova · 2020-06-30T11:54:35Z

presto-common/src/main/java/com/facebook/presto/common/block/Block.java

What's the motivation to have the default implementation? It seems incorrect to report region-size as region-logical-size.

@mbasmanova For leaf blocks (ie. non Array/Map/Row/Dict/RLE blocks), the logicalSizeInBytes is the same as sizeInBytes. See the following code:

/** * Returns the size of the block contents, regardless of internal representation. * The same logical data values should always have the same size, no matter * what block type is used or how they are represented within a specific block. * * This can differ substantially from {@link #getSizeInBytes} for certain block * types. For RLE, it will be {@code N} times larger. For dictionary, it will be * larger based on how many times dictionary entries are reused. */ default long getLogicalSizeInBytes() { return getSizeInBytes(); }

Similarly, regional logical size for leaf blocks is the same as the regional size. We have default implementation here so that we don't have to implement the same thing for all leaf blocks.

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

mbasmanova · 2020-06-30T12:02:58Z

presto-main/src/test/java/com/facebook/presto/block/TestArrayBlock.java

consider replacing comments with variable names, e.g.

Block arrayOfLong =

Block arrayOfRleOfLong =

Block arrayOfRleOfArrayOfLong =
...

@mbasmanova I renamed the variables. However it's not as straightforward as the comment:

// Row(Dictionary(LongArrayBlock), Dictionary(Row(LongArrayBlock, LongArrayBlock))) Block rowOfDictionaryOfLongAndDictionaryOfRowOfLongAndLong = ...

So I kept both the comments and renamed variables.

mbasmanova

Allow additional error margin for estimatedMaxCapacity

typo in commit message: graceFactorFordMaxCapacity -> graceFactorForMaxCapacity

mbasmanova · 2020-06-30T12:04:50Z

presto-array/src/main/java/com/facebook/presto/array/Arrays.java

all caps with underscores

consider making this configurable

@mbasmanova I will send a separate PR to make it configurable.

mbasmanova · 2020-06-30T12:07:03Z

presto-array/src/main/java/com/facebook/presto/array/Arrays.java

This is a generic method that can be used in many places. However, the commit says that the change applies only to one specific use case. I'd expect the caller to apply this new factor when computing estimatedMaxCapacity.

use Math.toIntExact instead of (int)

This is a generic method that can be used in many places. However, the commit says that the change applies only to one specific use case. I'd expect the caller to apply this new factor when computing estimatedMaxCapacity.

Moved the application of this new factor to setupDecodedBlockAndMapPositions() where the estimatedMaxCapacity is calculated.

use Math.toIntExact instead of (int)

It's actually casting double to int. toIntExact(long) only takes long.

mbasmanova · 2020-06-30T12:09:07Z

@yingsu00

With these changes I saw 5x CPU gain on the regressed query.

To clarify, is the query running 5x faster when before the regression? E.g. before regression CPU time was N, after regression - 10N, now it is 0.2N or 2N?

yingsu00 · 2020-06-30T17:20:45Z

@yingsu00

With these changes I saw 5x CPU gain on the regressed query.

To clarify, is the query running 5x faster when before the regression? E.g. before regression CPU time was N, after regression - 10N, now it is 0.2N or 2N?

Hi Masha, it is 2N.

mbasmanova · 2020-06-30T17:59:06Z

@yingsu00 How much regression is left after this change?

yingsu00 · 2020-07-01T02:34:40Z

@yingsu00 How much regression is left after this change?

@mbasmanova I tested on vll1_verifier1 and there is no regression any more.
WIthout optimized_repartitioning: 20200701_020416_00019_t29du PartitionedOutputOperator 8.9m
With fixed optimized_repartitioning: 20200701_020453_00021_t29du PartitionedOutputOperator 2.93m
With un-fixed optimized_repartitioning : 20200701_021836_00002_qpi5g PartitionedOutputOperator 2.95 min

yingsu00 · 2020-07-01T11:22:05Z

@mbasmanova Masha, I still need to touch up the test for BlockEncodingBuffers a bit. I will update the PR tomorrow.

mbasmanova · 2020-07-01T12:21:10Z

@mbasmanova Masha, I still need to touch up the test for BlockEncodingBuffers a bit. I will update the PR tomorrow.

@yingsu00 Thank you for the heads up.

mbasmanova · 2020-07-01T12:23:09Z

@yingsu00 How much regression is left after this change?

@mbasmanova I tested on vll1_verifier1 and there is no regression any more.
WIthout optimized_repartitioning: 20200701_020416_00019_t29du PartitionedOutputOperator 8.9m
With fixed optimized_repartitioning: 20200701_020453_00021_t29du PartitionedOutputOperator 2.93m
With un-fixed optimized_repartitioning : 20200701_021836_00002_qpi5g PartitionedOutputOperator 2.95 min

I'm confused. fixed and un-fixed are the same: 2.93m vs. 2.95m. What is un-fixed here? Is it the version that used more memory than original repartitioning? E.g. the "fix" refers to fixing memory usage?

yingsu00 · 2020-07-02T07:42:59Z

@yingsu00 How much regression is left after this change?

@mbasmanova I tested on vll1_verifier1 and there is no regression any more.
WIthout optimized_repartitioning: 20200701_020416_00019_t29du PartitionedOutputOperator 8.9m
With fixed optimized_repartitioning: 20200701_020453_00021_t29du PartitionedOutputOperator 2.93m
With un-fixed optimized_repartitioning : 20200701_021836_00002_qpi5g PartitionedOutputOperator 2.95 min

I'm confused. fixed and un-fixed are the same: 2.93m vs. 2.95m. What is un-fixed here? Is it the version that used more memory than original repartitioning? E.g. the "fix" refers to fixing memory usage?

@mbasmanova Hi Masha, yes, the un-fixed refers to the version early this year without the memory reduction fixes. fixed means this PR + other CPU regression fixes + all previous memory reduction PR. I can test the regressed version too (all previous memory reduction PR but no CPU regression fixes)

yingsu00 · 2020-07-02T12:48:44Z

@mbasmanova Hi Masha, I just updated the PR with the following changes:

Fixed a bug in RowBlockEncodingBuffer.setupDecodedBlockAndMapPositions() in 3861db42e4 Fix serialized size estimation in BlockEncodingBuffers where childrenEstimatedSerializedSizeInBytes was not added up.
Fixed TestMapBlock.test() in Fix getLogicalSizeInBytes() for Blocks
Added 47b093162d Add tests for max buffer capacity estimation
Added 22132d19fb Always make space for nullsBuffer and hashTablesBuffer
Moved the application of the new scale factor to setupDecodedBlockAndMapPositions() in 61f21f7f2c Allow additional error margin for estimatedMaxCapacity

Thank you very much for reviewing!

mbasmanova

@yingsu00 LGTM.

mbasmanova · 2020-07-02T13:52:18Z

...to-main/src/main/java/com/facebook/presto/operator/repartition/ArrayBlockEncodingBuffer.java

nit: perhaps, refactor to extract a helper method to avoid copy-paste

@mbasmanova did you mean something like this?

setEstimatedNullsBufferMaxCapacity(getEstimatedBufferMaxCapacity(targetBufferSize, Byte.BYTES, POSITION_SIZE)); estimatedValueBufferMaxCapacity = getEstimatedBufferMaxCapacity(targetBufferSize, Byte.BYTES, POSITION_SIZE);

and in AbstractBlockEncodingBuffer:

protected static int getEstimatedBufferMaxCapacity(double targetBufferSize, int unitSize, int positionSize) { return (int) (targetBufferSize * unitSize / positionSize * GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY); }

@yingsu00 Yes, this might reduce copy-paste and make it easier to read and ensure we don't forget GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY somewhere.

@mbasmanova Hi Masha, I just updated the PR with a new commit e8511df636 Refactor buffer max capacity calculation. Thank you again, and happy long weekend!

getLogicalSizeInBytes was supposed to get the deflated sizes of the blocks if they are DictionaryBlock or RunLengthEncodedBlock. However if the nested blocks are DictionaryBlock or RunLengthEncodedBlock, the size was not correctly calculated. This commit fixed this issue.

When a block passed to OptimizedPartitionedOutputOperator is a RLE or Dictionary block, we used to estimated the serialized size using getLogicalSize() which returns the size of the block after inflation. However the child block of the RLE or Dictionary Block was using plain sizeInBytes without considering it is going to be expanded. This commit fixes this problem by adding a scale factor to estimate how many times the child blocks are going to be expanded.

Block.getSizeInBytes() and Block.getLogicalSizeInBytes() always adds up the sizes of nulls buffer even if the block cannot contain nulls. When estimating the max buffer capacity for BlockEncodingBuffers, we can also leave the space for the nullsBuffer and hashTablesBuffer. This will not waste memory because the buffers are not actually allocated until blocks with nulls or hashtables come in. It will make the buffers sizes proportional to the blocks' logical sizes, and make the code cleaner.

In "Enforce buffer size limits for BlockEncodingBuffer" we introduced estimatedMaxCapacity such that the growth of the buffers beyond that value become slower. However the estimated max capacity is not always 100% accurate, and a underestimated value has negative impact on the CPU performance. This commit gives the estimatedMaxCapacity some head room by introducing a GRACE_FACTOR_FOR_MAX_BUFFER_CAPACITY with default value 1.2f.

yingsu00 · 2020-07-06T11:24:41Z

@mbasmanova Hi Masha, the tests now all passed. Thank you for reviewing!

yingsu00 force-pushed the fixEstimatedSize branch from 7725277 to 1a8424f Compare June 22, 2020 02:49

yingsu00 requested a review from a team June 22, 2020 07:19

mbasmanova added the aria Presto Aria performance improvements label Jun 22, 2020

mbasmanova reviewed Jun 22, 2020

View reviewed changes

mbasmanova reviewed Jun 23, 2020

View reviewed changes

mbasmanova requested a review from a team June 24, 2020 10:45

yingsu00 force-pushed the fixEstimatedSize branch from 1a8424f to 71ce03c Compare June 30, 2020 09:55

mbasmanova reviewed Jun 30, 2020

View reviewed changes

yingsu00 force-pushed the fixEstimatedSize branch from 71ce03c to 61f21f7 Compare July 2, 2020 12:41

mbasmanova approved these changes Jul 2, 2020

View reviewed changes

yingsu00 force-pushed the fixEstimatedSize branch from 61f21f7 to e8511df Compare July 2, 2020 23:02

Ying Su added 8 commits July 4, 2020 02:13

Fix serialized size estimation in BlockEncodingBuffers

2ee722b

Remove childrenEstimatedSerializedSizeInBytes from DecodedBlockNode

4df0d74

Add tests for max buffer capacity estimation

7adb6a2

Refactor buffer max capacity calculation

533fd29

yingsu00 force-pushed the fixEstimatedSize branch from e8511df to 533fd29 Compare July 4, 2020 11:03

mbasmanova merged commit 6b51cbf into prestodb:master Jul 6, 2020

Fix estimated serialized size for BlockEncodingBuffers #14688

Fix estimated serialized size for BlockEncodingBuffers #14688

Uh oh!

Conversation

yingsu00 commented Jun 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

yingsu00 commented Jun 22, 2020

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

yingsu00 commented Jun 24, 2020

Uh oh!

yingsu00 commented Jun 24, 2020

Uh oh!

yingsu00 commented Jun 24, 2020

Uh oh!

mbasmanova commented Jun 24, 2020

Uh oh!

yingsu00 commented Jun 24, 2020

Uh oh!

mbasmanova commented Jun 24, 2020

Uh oh!

yingsu00 commented Jun 25, 2020

Uh oh!

yingsu00 commented Jun 30, 2020

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbasmanova commented Jun 30, 2020

Uh oh!

yingsu00 commented Jun 30, 2020

Uh oh!

mbasmanova commented Jun 30, 2020

Uh oh!

yingsu00 commented Jul 1, 2020

Uh oh!

yingsu00 commented Jul 1, 2020

Uh oh!

mbasmanova commented Jul 1, 2020

Uh oh!

mbasmanova commented Jul 1, 2020

Uh oh!

yingsu00 commented Jul 2, 2020

Uh oh!

yingsu00 commented Jul 2, 2020

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

yingsu00 commented Jun 21, 2020 •

edited

Loading