Conversation
c93a97e to
2a38a21
Compare
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
can we move this to other aggregation fields?
There was a problem hiding this comment.
Can you elaborate what you mean?
There was a problem hiding this comment.
Move this with other aggregation fields,
2a38a21 to
e47ef33
Compare
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Consider adding another test with the default session property to make sure streaming is not enabled and this session property will never be enabled by default.
There was a problem hiding this comment.
yeah, it's added in testBucketedAndSortedBySameKey
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Add another test with group by custkey(prefix of the bucket/sorted_by column). I think the streaming aggregation should be enabled.
There was a problem hiding this comment.
Yes, but that requires further changes in codebase which we're not proceeding right now, will add a todo here
e47ef33 to
54c0f77
Compare
yuanzhanhku
left a comment
There was a problem hiding this comment.
Thanks Ke for productionizing this feature and adding the thorough tests.
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
2cf4e42 to
8b7f822
Compare
highker
left a comment
There was a problem hiding this comment.
"Add ability to disable splitting file in hive connector" LGTM
There was a problem hiding this comment.
cc @arunthirupathi in case this could be useful for delta integration.
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
highker
left a comment
There was a problem hiding this comment.
"Add ability to do streaming aggregation for hive table scans" LGTM
There was a problem hiding this comment.
hmmm if we are already guarding splittable here through isStreamingAggregationEnabled, do we still need to introduce isFileSplittable config at all? cc: @yuanzhanhku @arunthirupathi
There was a problem hiding this comment.
isFileSplittable is not needed for streaming aggregation support. Ke just added it for benchmarking propose. It allows us to run a query with file spilling disabled and streaming aggregation enable/disabled so that we can measure the improvement of steaming aggregation without the impact from file splitting.
presto-hive/src/test/java/com/facebook/presto/hive/TestStreamingAggregationPlan.java
Outdated
Show resolved
Hide resolved
highker
left a comment
There was a problem hiding this comment.
"Add ability to disable splitting file in hive connector" LGTM
...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
because we can't enable streaming aggregation when querying multiple partitions without grouping by partition keys, just renamed the test name as "testQueryingMultiplePartitions" to avoid confusion
|
Overall LGTM. Some nits on release note:
Would be good to specify what user-facing queries will be impacted and how users could leverage this config to achieve what. "streaming for partial aggregation" is too deep in terms of terminology for users to understand.
Well, let's see if the first commit is actually needed or not to decide if we need this release note. |
There was a problem hiding this comment.
Does the prefix match needs to be ordered ? I assume equality condition and hence order does not matter.
There was a problem hiding this comment.
Right, the order doesn't not matter here. StreamPartitionColumns is similar to the bucket column concept by it is for splits instead of files.
8b7f822 to
5785c85
Compare
As of now, we only support streaming aggregation for the cases where group-by keys are the same as order-by keys, cases where group-by keys are a subset of order-by keys are not supported for now. Co-Authored-By: Zhan Yuan <yuanzhanhku@gmail.com>
dc65f49 to
785d82c
Compare
We can always enable streaming aggregation for partial aggregations without affecting correctness. But if the data isn't clustered (for example: ordered) by the group-by keys, it may cause regressions on latency and resource usage. This session property is just a solution to force enabling streaming aggregation when we know the execution would benefit from partial streaming aggregation. We can work on determining it based on the input table properties later.
785d82c to
7a0ac10
Compare
Note:
When we compare performance of regular queries and queries that has streaming aggregation turned on, they have multiple differences which makes it hard to tell which factor is the leading factor
We add the ability to disable splitting file in hive connector for better debugging purposes