Skip to content

Implement Parallel Partition Pruning for Glue Hive Metastore#13729

Merged
highker merged 1 commit intoprestodb:masterfrom
anoopj:anoopj-0.230-rc
Dec 6, 2019
Merged

Implement Parallel Partition Pruning for Glue Hive Metastore#13729
highker merged 1 commit intoprestodb:masterfrom
anoopj:anoopj-0.230-rc

Conversation

@anoopj
Copy link

@anoopj anoopj commented Nov 20, 2019

== RELEASE NOTES ==
Hive Changes
* Add support for parallel partition fetching for the Glue metastore.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 20, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Anoop Johnson (a86c3dadb7467bb896935a9e04c93764154ee801)

@anoopj
Copy link
Author

anoopj commented Nov 20, 2019

For what it's worth, we also contributed this to Presto SQL and was recently merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an accidental change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will fix.

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only one nit. @wenleix Do you want to take a look as well?

@anoopj
Copy link
Author

anoopj commented Nov 26, 2019

@wenleix @rongrong ping.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments only

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it oneline

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes the line 151 columns wide, but I'll make the change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna shutdown the executor in PreDestroy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to explicitly shut down because:

  1. We are using a cached threadpool with the idle timeout of 60 seconds.
  2. The created threads are all daemon threads.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put throws .. to its own line with 8 spaces for indentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must have a really powerful coordinator lol...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 threads is fairly conservative.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nit. Thanks @anoopj for the contribution! Please ping us on Slack once you address the comment so we can merge it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getGetPartitionThreads. Two issue:

  1. It should be an noun for get/set method (e.g. partitionFetchingThreadCount)
  2. We should add glue into the method name to reflect it only affects Glue related metastore :)

What about getGluePartitionFetchingThreadCount? ditto for the set method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, but I feel that it would be inconsistent with the rest of the code base. Example. Also we should not need to add the "Glue" prefix since the class is specific to Glue.

Copy link
Contributor

@wenleix wenleix Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anoopj : But isn't this configuration itself tied to Glue?

We should decide what the convention is, and if necessary, fix other use cases. What do you think whether we want "Glue" in method name ? @highker , @arhimondr , @rongrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What about partitionSegmentCount?

This change parallelizes the partition fetch for the Glue metastore by
splitting the partitions into non-overlapping segments[2]. This can
speed up query planning by upto an order of magnitude.

[1] https://docs.aws.amazon.com/glue/latest/webapi/API_Segment.html
@anoopj
Copy link
Author

anoopj commented Dec 5, 2019

@wenleix @rongrong @highker

Updated PR with the changes based on your feedback.

@highker highker self-assigned this Dec 5, 2019
@highker highker merged commit b978025 into prestodb:master Dec 6, 2019
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants