Lazily build hashtable for MapBlock by yingsu00 · Pull Request #11791 · prestodb/presto

yingsu00 · 2018-10-26T08:16:00Z

Fix for #11808
Presto builds hashtable for MapBlocks eagerly when constructing the
MapBlock even it's not needed in the query. Building a hashtable could
take up to 30% CPU of the scan cost on a map column. This commit defers
the hashtable build to the time it's needed in SeekKey(). Note that we
only do this to the MapBlock, not the MapBlockBuilder to avoid complex
synchronization problems. The MapBlockBuilder will always build the
hashtable. As the result MergingPageOutput and PartitionOutputOperator
will still rebuild the hashtables when needed. The measurements shows
there will be less than 10% pages for MergingPageOutput to build the
hashtables. We will have a seperate PR to improve PartitionOutput
and avoid rebuilding the pages so as to avoid hashtable rebuilding.

Simple select checsum queries show over 40% CPU gain:

Test                          | After  | Before | Improvement
select 2 map columns checksum | 11.69d | 20.06d | 42%
Select 1 map column checksum  |  9.67d | 17.73d | 45%

findepi

i just skimmed. I didn't intend to review this.

findepi · 2018-10-26T09:55:20Z

presto-spi/src/main/java/com/facebook/presto/spi/block/AbstractMapBlock.java

add requireNonNull (or explanatory comment)

findepi · 2018-10-26T09:56:02Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

please include rationale

findepi · 2018-10-26T09:57:05Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

Why not access this.hashTables directly? you do this when you complete computation anyway

findepi · 2018-10-26T09:57:10Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

this.hashTables

electrum

There are lots of changes here, many of which seems to be refactorings. Can you pull those into separate commits so that it’s esiser to review and see the real change?

yingsu00 · 2018-11-01T07:46:01Z

@electrum Hi David, I have removed the formatting changes (breaking long lines). Now the changes should be all related to the logic change. Please let me know if this is what you want, thanks!

dain

Some minor comments/suggestions from my first read.

dain · 2018-11-02T00:00:37Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

Invert this if condition to remove a level of nesting. Something like this:

if (this.hashTables != null) { return this; }

dain · 2018-11-02T00:02:25Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

Also, invert this one, and move to right after the start of the synchronized block.

dain · 2018-11-02T00:05:42Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlockBuilder.java

the {} should go on the previous line

dain · 2018-11-02T00:07:02Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlockEncoding.java

please add clarifying parentheses

Actually, I're prefer to see two separate checks. One to ensure the keys and values are the same position count, and one the verifies the hash table size.

dain · 2018-11-02T00:11:05Z

presto-spi/src/main/java/com/facebook/presto/spi/block/SingleMapBlock.java

If we were to introduce an interface for this class, what would the API look like for that interface? If it is only a small number of methods, we might want to add something like that to keep the abstractions between these classes simpler. I'm not saying we should actually add an interface here, yet; I'm just curious what it would look like if we did.

@dain which classes are you considering to implement this interface? Did you just mean SingleMapBlock and AbstractSingleMapBlock?

I mean the AbstractSingleMapBlock argument here. If that were an interface build specifically for this class, what methods would it have? If it is small, we might want to add one to clean up the code and simplify testing.... just a thought

@dain Did you mean the mapBlock we passed in? It's AbstractMapBlock. The referenced methods in SingleMapBlock include getRawKeyBlock(), getRawValueBlock(), getHashTables(), and it also accesses keyNativeHashCode and keyBlockNativeEquals members. Do you think it's worth making a new interface? If yes I'll add getKeyNativeHashCode() and getKeyBlockNativeEquals() and make a 5 method interface in a separate commit.

If it is only 5 methods, I would consider adding the interface. This is just my opinion... I generally get a bad feeling anytime I see a method taking a parameter with a type named Abstract*, as to me it screams, we should have an interface here. In this case I would name the interface MapBlockData. Again, just my opinion. You can ask others how they feel about this.

@dain makes sense. Shall I send a new PR for this interface or a seperate commit in this same PR, or just in this commit?

All of those options are fine with me. Maybe ask @haozhun what he prefers.

The problem here is that it cannot be an interface because those fields are not public.

dain

This seems good to me. @haozhun, did you want to review this?

haozhun

Looks good

haozhun · 2018-11-08T00:53:56Z

presto-spi/src/main/java/com/facebook/presto/spi/block/MapBlock.java

Add comment: write to the field is protected by "this" monitor.

haozhun · 2018-11-08T01:34:55Z

presto-spi/src/main/java/com/facebook/presto/spi/block/SingleMapBlock.java

The problem here is that it cannot be an interface because those fields are not public.

Presto builds hashtable for MapBlocks eagerly when constructing the MapBlock even it's not needed in the query. Building a hashtable could take up to 40% CPU of the scan cost on a map column. This commit defers the hashtable build to the time it's needed in SeekKey(). Note that we only do this to the MapBlock, not the MapBlockBuilder to avoid complex synchronization problems. The MapBlockBuilder will always build the hashtable. As the result MergingPageOutput and PartitionOutputOperator will still rebuild the hashtables when needed. The measurements shows there will be less than 10% pages for MergingPageOutput to build the hashtables. We will have a seperate PR to improve PartitionOutput and avoid rebuilding the pages so as to avoid hashtable rebuilding. Simple select checsum queries show over 40% CPU gain: Test | After | Before | Improvement select 2 map columns checksum | 11.69d | 20.06d | 42% Select 1 map column checksum | 9.67d | 17.73d | 45%

This reverts 1) commit ad05dcb. 2) commit 23de11f. PR prestodb#11791 (commit 23de11f and ad05dcb), which lazily builds the hashtables for maps, introduced a regression for the case where the MapBlock is created through AbstractMapBlock.getRegion(). The hashtables built on the MapBlock region were not updated in the original MapBlock, thus causing hashtables repeatedly being built on the same base MapBlock.

This reverts 1) commit ad05dcb. 2) commit 23de11f. PR #11791 (commit 23de11f and ad05dcb), which lazily builds the hashtables for maps, introduced a regression for the case where the MapBlock is created through AbstractMapBlock.getRegion(). The hashtables built on the MapBlock region were not updated in the original MapBlock, thus causing hashtables repeatedly being built on the same base MapBlock.

yingsu00 requested review from dain and wenleix October 26, 2018 08:16

facebook-github-bot added the CLA Signed label Oct 26, 2018

yingsu00 requested a review from haozhun October 26, 2018 08:16

findepi reviewed Oct 26, 2018

View reviewed changes

yingsu00 force-pushed the lazyMapHT branch from daf60d3 to 95386f0 Compare October 26, 2018 19:58

yingsu00 requested a review from electrum October 26, 2018 20:15

electrum reviewed Oct 28, 2018

View reviewed changes

yingsu00 force-pushed the lazyMapHT branch from 95386f0 to d369568 Compare November 1, 2018 07:41

dain self-assigned this Nov 1, 2018

dain reviewed Nov 2, 2018

View reviewed changes

yingsu00 force-pushed the lazyMapHT branch from d369568 to 8f86094 Compare November 2, 2018 03:44

dain reviewed Nov 2, 2018

View reviewed changes

yingsu00 assigned haozhun Nov 6, 2018

dain removed their assignment Nov 7, 2018

haozhun approved these changes Nov 8, 2018

View reviewed changes

haozhun assigned rongrong and unassigned haozhun Nov 8, 2018

yingsu00 force-pushed the lazyMapHT branch from 8f86094 to 1992ce5 Compare November 8, 2018 01:47

rongrong assigned yingsu00 Nov 8, 2018

yingsu00 force-pushed the lazyMapHT branch from 1992ce5 to 009165f Compare November 8, 2018 01:55

yingsu00 force-pushed the lazyMapHT branch from 009165f to 62dc3a5 Compare November 8, 2018 04:04

yingsu00 merged commit 23de11f into prestodb:master Nov 8, 2018

This was referenced Jan 8, 2019

HashTable for base MapBlock is not updated when referencing the hashtable on the sliced MapBlock #12187

Closed

Revert lazy map #12196

Merged

Conversation

yingsu00 commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

electrum left a comment

Choose a reason for hiding this comment

Uh oh!

yingsu00 commented Nov 1, 2018

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

haozhun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yingsu00 commented Oct 26, 2018 •

edited

Loading