Implement Parallel Partition Pruning for Glue Hive Metastore#13729
Implement Parallel Partition Pruning for Glue Hive Metastore#13729highker merged 1 commit intoprestodb:masterfrom
Conversation
|
For what it's worth, we also contributed this to Presto SQL and was recently merged. |
There was a problem hiding this comment.
This seems to be an accidental change?
a86c3da to
78fcba8
Compare
There was a problem hiding this comment.
That makes the line 151 columns wide, but I'll make the change.
There was a problem hiding this comment.
Do we wanna shutdown the executor in PreDestroy?
There was a problem hiding this comment.
I don't think we need to explicitly shut down because:
- We are using a cached threadpool with the idle timeout of 60 seconds.
- The created threads are all daemon threads.
There was a problem hiding this comment.
put throws .. to its own line with 8 spaces for indentation.
There was a problem hiding this comment.
You must have a really powerful coordinator lol...
There was a problem hiding this comment.
20 threads is fairly conservative.
There was a problem hiding this comment.
nit: getGetPartitionThreads. Two issue:
- It should be an noun for get/set method (e.g. partitionFetchingThreadCount)
- We should add
glueinto the method name to reflect it only affects Glue related metastore :)
What about getGluePartitionFetchingThreadCount? ditto for the set method.
There was a problem hiding this comment.
I hear you, but I feel that it would be inconsistent with the rest of the code base. Example. Also we should not need to add the "Glue" prefix since the class is specific to Glue.
There was a problem hiding this comment.
@anoopj : But isn't this configuration itself tied to Glue?
We should decide what the convention is, and if necessary, fix other use cases. What do you think whether we want "Glue" in method name ? @highker , @arhimondr , @rongrong
There was a problem hiding this comment.
nit: What about partitionSegmentCount?
This change parallelizes the partition fetch for the Glue metastore by splitting the partitions into non-overlapping segments[2]. This can speed up query planning by upto an order of magnitude. [1] https://docs.aws.amazon.com/glue/latest/webapi/API_Segment.html
78fcba8 to
f06d91c
Compare

Uh oh!
There was an error while loading. Please reload this page.