Improve performance of getRegionLogicalSizeInBytes by arunthirupathi · Pull Request #15843 · prestodb/presto

arunthirupathi · 2021-03-17T23:29:34Z

DictionaryBlock getRegionLogicalSizeInBytes is slow when abandoning
StringDictionaryEncoding. When abandoning string dictionary encoding,
dictionary has large number of keys (millions) and it is converted
in segements of maximum 1024. The getRegionLogicalSizeInBytes allocates
size array of number of keys as a cache.

With this change code is modified to use the cache code path only
when the number of keys is less than the length. The conversion
ratio can be tuned if required.

Before this change, abandoning a dictionary with 10 million keys
used to 150 times slower than the direct encoding. Now it is only
50 times worse (similar to dictionary encoding).

fixes #15506

Test plan -
Added unit test for the non cached code path.

== RELEASE NOTES ==

General Changes
* Improve Dictionary Block getRegionLogicalSizeInBytes performance

sdruzkin · 2021-03-18T00:17:29Z

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

tl;dr: so the idea is that it doesn't make sense to create a seenSizes cache if the cached seenSize value is not going to be reused, right?

That is another way of looking at it, and that comment probably makes this more simpler.

I was looking at it from the opinion, that there is a cost to call getRegionLogicalSizeInBytes on a block. There is a cost to do the cache maintenance.

presto-main/src/test/java/com/facebook/presto/block/TestDictionaryBlock.java

highker

Block is too critical for Presto. Maybe worth some profiling given it's on the critical path.

highker · 2021-03-18T05:11:11Z

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

Let's maybe have a generic comment. Presto main engine doesn't have the concept of files.

Remove the reference to file

highker · 2021-03-18T05:26:24Z

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

getRegionLogicalSizeInBytes is a very expensive call. How likely/useful it would be to have a light-weighted hashmap:

int[] seenIds = new int[length]; long[] seenSize = new long[length]; Arrays.fill(seenIds, -1); for (int i = positionOffset; i < positionOffset + length; i++) { int id = getId(i); int index = id % length; if (seenIds[index] == id) { sizeInBytes += seenSize[index]; } else if (seenIds[index] == -1) { seenIds[index] = id; seenSize[index] = dictionary.getRegionLogicalSizeInBytes(id, 1); sizeInBytes += seenSize[index]; } else { sizeInBytes += dictionary.getRegionLogicalSizeInBytes(getId(i), 1); } }

Frankly, I don't know if this approach is going to be better or worth. Maybe worth profiling?

The seenIds, need to be a regular HashMap. The dictionary index could be 1,4,7 with length 3, in which case all of them will collide and produce wrong results. Adding the actual hash map will make any implementation much worser.

I looked at few different implementations of the getRegionLogicalSizeInBytes (VariableWidthBlock is cheap, but MapBlock is costly). So it is very hard to come up with cost model and even profiling.

What I have in mind is, instead of doing the in-place code path when length <= dictionarySize, we can do the in place code path, when length * 2 <= dictionarySize, this assumes that a hit in the cache is twice cheaper than calling the actual method. I am ok changing the condition to length * 5 <= dictionarySize, so the in place code path is triggered when the cache is not going to yield any benefits.

Or other approach we can do is, using a regular hashMap in the inplace code path and let it grow dynamically.

Other approach I coded and discarded is, I can make changes to the dictionary writer to not call the getRegionLogicalSizeInBytes and can do inplace evaluation as the use case is well understood. But I believe the fix here is better as it will improve the performance of Presto. I coded up the dictionary Writer fix and discarded it. I will push that change as well, just in case.

DictionaryBlock getRegionLogicalSizeInBytes is slow when abandoning StringDictionaryEncoding. When abandoning string dictionary encoding, dictionary has large number of keys (millions) and it is converted in segements of maximum 1024. The getRegionLogicalSizeInBytes allocates size array of number of keys as a cache. With this change code is modified to use the cache code path only when the number of keys is less than the length. The conversion ratio can be tuned if required. Before this change, abandoning a dictionary with 10 million keys used to 150 times slower than the direct encoding. Now it is only 50 times worse (similar to dictionary encoding).

sdruzkin · 2021-03-18T17:16:40Z

presto-common/src/main/java/com/facebook/presto/common/block/DictionaryBlock.java

+                if (seenSizes[position] < 0) {
+                    seenSizes[position] = dictionary.getRegionLogicalSizeInBytes(position, 1);
+                }
+                sizeInBytes += seenSizes[position];


Q regarding the original implementation. This is a dictionary, do we really need to multiply the entry size by the number of times this entry appears?

If the answer is no, we can try to collect used dictionary positions into a IntOpenHashSet from fastutil.

If yes, instead of a regular hash map you can use Int2IntOpenHashMap. It would really depends on the relation between the dict size and the length, and still likely be more expensive.

The logical size, wants to get the underlying size of the data. That will involve multiplying the entry size, by number of times it appears. The logical size is used in few places, where the data is converted from dictionary encoding to direct encoding. One such place is abandoning dictionary encoding and switching to direct encoding.

I agree that fastutil's collections are better than java.utils in terms of CPU/memory. But my opinion, is the code is simple when the likelihood of hit is very low, we can have a non cached path and avoid the cache completely.

arunthirupathi requested review from bhhari, highker and sdruzkin March 17, 2021 23:29

sdruzkin approved these changes Mar 18, 2021

View reviewed changes

arunthirupathi force-pushed the convert_string_dictionary_slowness branch 3 times, most recently from 6d86ae3 to dcf3302 Compare March 18, 2021 01:36

arunthirupathi mentioned this pull request Mar 18, 2021

Abandoning String Dictionary is very slow #15506

Closed

highker requested review from arhimondr, rschlussel and tdcmeehan March 18, 2021 02:24

highker reviewed Mar 18, 2021

View reviewed changes

arunthirupathi force-pushed the convert_string_dictionary_slowness branch 2 times, most recently from 594af9d to 61dd293 Compare March 18, 2021 06:19

arunthirupathi requested a review from highker March 18, 2021 06:20

arunthirupathi force-pushed the convert_string_dictionary_slowness branch from 61dd293 to 0fc9cce Compare March 18, 2021 14:21

arunthirupathi force-pushed the convert_string_dictionary_slowness branch from 0fc9cce to e38594f Compare March 18, 2021 14:28

sdruzkin reviewed Mar 18, 2021

View reviewed changes

highker approved these changes Mar 19, 2021

View reviewed changes

highker merged commit 7c7930e into prestodb:master Mar 19, 2021

varungajjala mentioned this pull request Mar 23, 2021

Add release notes for 0.250 #15865

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of getRegionLogicalSizeInBytes#15843

Improve performance of getRegionLogicalSizeInBytes#15843
highker merged 1 commit intoprestodb:masterfrom
arunthirupathi:convert_string_dictionary_slowness

arunthirupathi commented Mar 17, 2021 •

edited

Loading

Uh oh!

sdruzkin Mar 18, 2021 •

edited

Loading

Uh oh!

arunthirupathi Mar 18, 2021

Uh oh!

Uh oh!

highker left a comment

Uh oh!

highker Mar 18, 2021

Uh oh!

arunthirupathi Mar 18, 2021

Uh oh!

highker Mar 18, 2021

Uh oh!

arunthirupathi Mar 18, 2021

Uh oh!

arunthirupathi Mar 18, 2021

Uh oh!

sdruzkin Mar 18, 2021 •

edited

Loading

Uh oh!

arunthirupathi Mar 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arunthirupathi commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdruzkin Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

highker Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

highker Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

sdruzkin Mar 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arunthirupathi Mar 23, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arunthirupathi commented Mar 17, 2021 •

edited

Loading

sdruzkin Mar 18, 2021 •

edited

Loading

sdruzkin Mar 18, 2021 •

edited

Loading