Optimize partition value evaluation in filter pushdown#25248
Optimize partition value evaluation in filter pushdown#25248feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
jaystarshot
left a comment
There was a problem hiding this comment.
Where are the predicateInputs set and used?
66b24fb to
226b17b
Compare
I was thinking to split changes to spi change and the rest, updated the code to include the logic which uses this information too. |
presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java
Show resolved
Hide resolved
| } | ||
|
|
||
| ImmutableList.Builder<HivePartition> resultBuilder = ImmutableList.builder(); | ||
| Map<List<NullableValue>, Boolean> cacheResult = new HashMap<>(); |
There was a problem hiding this comment.
Cache the result for predicates
There was a problem hiding this comment.
Can this cache blow up for very large distinct partition values ? maybe a guava cache is better?
There was a problem hiding this comment.
Thanks for the suggestion, just updated with guava cache
|
LGTM , Is it easy to add a unit test in TestHivePartitionManager? |
8fbe17d to
0cd06e3
Compare
Sure, added a new test |
0cd06e3 to
72513d4
Compare
|
@jaystarshot Can you review it again? I just updated the TestHiveClientConfig to fix the failed test. |
|
Thanks for the release note entry! Is there somewhere in the documentation that it could link to? |
Description
Constraint is used to validate whether a hive partition matches the filters specified in query during filter pushdown.
presto/presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java
Line 417 in c08c6ef
Here
constraint.predicate().get().test(partition.getKeys())is called for one partition, and this function is called for all candidate partitions. However, it may not be necessary to call this function for all partitions.For example, let's say we have a table with two partition keys,
dsandts, wheredsis the date andtsis the hour of the day. When the query is querying data in the past 180 days, there can be 180*24=4320 candidate partitions. However, if thepredicateinconstraintonly containsdsbut notts, we actually only need to call this function 180 times. It can lead to big difference especially when the predicate is expensive, for example large expressions or include complex string/json operations.In the motivating example, we see query planning time decrease from 3 minutes to 10 seconds if we reduce the number of such function calls.
In this PR, I added a function to constraint to return the column handles which are used in predicate, and populate it during filter pushdown, so as to enable the above optimization.
Motivation and Context
Reduce query planning time.
Impact
Reduce query planning time.
Test Plan
Test with production queries end to end
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.