Skip to content

Fix lazy map hashtable regression#12189

Closed
yingsu00 wants to merge 1 commit intoprestodb:masterfrom
yingsu00:lazyMapHTRegression1
Closed

Fix lazy map hashtable regression#12189
yingsu00 wants to merge 1 commit intoprestodb:masterfrom
yingsu00:lazyMapHTRegression1

Conversation

@yingsu00
Copy link
Contributor

@yingsu00 yingsu00 commented Jan 8, 2019

Resolves issue #12187

In 23de11f we lazily build the hashtables for map. It introduced a
regression for the case where the MapBlock is created through
AbstractMapBlock.getRegion(), in which the hashtables built on the
MapBlock region was not updated in the original MapBlock, thus causing
repeated hashtables build on the same base MapBlock. The change is to
encapsulate the int[] hashtables in AbstractMapBlock$HashTables object
and make it a member of MapBlock/MapBlockBuilder, so that when the
sliced MapBlock builds the hashtables, the base MapBlock would also gets
updated.

This PR was verified to fix the reported regression case on our verifier cluster.
Query CPU cost before the Lazy Map change: 57 min
Query CPU cost with Lazy Map change without this fix: > 50 days
Query CPU cost with Lazy Map change with this fix: 37 min

In 23de11f we lazily build the hashtables for map. It introduced a
regression for the case where the MapBlock is created through
AbstractMapBlock.getRegion(), in which the hashtables built on the
MapBlock region was not updated in the original MapBlock, thus causing
repeated hashtables build on the same base MapBlock. The change is to
encapsulate the int[] hashtables in AbstractMapBlock$HashTables object
and make it a member of MapBlock/MapBlockBuilder, so that when the
sliced MapBlock builds the hashtables, the base MapBlock would also gets
updated.
@wenleix
Copy link
Contributor

wenleix commented Jan 8, 2019

I take a quick look into the PR, and I don't recommend to rush into the current release, and revert first, for the following reasons:

  • I need a bit more thinking into the code, for example the getInstanceSize and getRetainedSize methods in HashTables. It doesn't look like a typical code style in Presto code base. I understand they are introduced since the retained size calculation in MapBlock and MapBlockBuilder are different so what's the right interface requires a bit more contemplation.

  • Reverting the lazy MapBlock hash table build will have little product impact, since today the PartitionedOutputOperator will build the hash table when appending the data into MapBlockBuilder -- and my understanding is we already have a commit in the hotfix branch so it should take minimal work to prepare the revert PR.

@yingsu00 yingsu00 mentioned this pull request Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants