You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A comparative run for TPCDS was performed with the below setup -
Workload : TPCDS power run
Scale factor : 1000
Cluster setup : 1 coordinator + 8 workers. Instances were AWS r6i.4xlarge (vCPU: 16, Memory: 128)
Trino version : 448 (TODO : link to config/catalog props for setup)
Prestissimo : 0.289, commit f4d5afd (TODO : link to config/catalog props for setup)
Plan diffs were generated from a custom tool that canonicalizes JSON plans (i.e plans obtained from EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON) xxx) to a tree representation and then diff-ed
Deep dive of planner issues identified from top 5 heavy hitters
Query
Root cause
Tracking issue
Q67
Trino modified the TopNRowNumber operator/plan-node to work with Rank/DenseRank as well. The new planner node is called TopNRanking. This improved Q67 drastically. Improving this in Prestissimo should help cut down total latency by ~5.5 min
Trino added a generic optimization to reduce total number of remote exchanges. This benefited Q51 among other queries. The root cause however, IMO, is that remote exchange operator in Prestissimo needs to be more performant
Trino added a rule TransformFilteringSemiJoinToInnerJoin to convert a SemiJoin to an InnerJoin. This specifically helped Q58. Presto does converts an Apply node to a SemiJoin, and the TransformUncorrelatedInPredicateSubqueryToDistinctInnerJoin manages to convert this to a InnerJoin, but then we undo all this by running TransformDistinctInnerJoinToLeftEarlyOutJoin. TransformFilteringSemiJoinToInnerJoin opens up the join space for reordering and this results in a better plan for Q58
N/A
Q05
Trino does a better job with UNION ALL when the result of the union is REPLICATE joined with another table. They do this by adding the ability to schedule multiple TableScan’s in a single stage
A comparative run for TPCDS was performed with the below setup -
Workload : TPCDS power run
Scale factor : 1000
Cluster setup : 1 coordinator + 8 workers. Instances were AWS r6i.4xlarge (vCPU: 16, Memory: 128)
Trino version : 448 (TODO : link to config/catalog props for setup)
Prestissimo : 0.289, commit f4d5afd (TODO : link to config/catalog props for setup)
Grafana deeplink
Latency comparison
Plan diffs
Plan diffs were generated from a custom tool that canonicalizes JSON plans (i.e plans obtained from
EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON) xxx
) to a tree representation and then diff-edWith all operators
Join only diff
Join+Agg diff
Deep dive of planner issues identified from top 5 heavy hitters
TransformFilteringSemiJoinToInnerJoin
to convert a SemiJoin to an InnerJoin. This specifically helped Q58. Presto does converts an Apply node to a SemiJoin, and the TransformUncorrelatedInPredicateSubqueryToDistinctInnerJoin manages to convert this to a InnerJoin, but then we undo all this by running TransformDistinctInnerJoinToLeftEarlyOutJoin. TransformFilteringSemiJoinToInnerJoin opens up the join space for reordering and this results in a better plan for Q58The text was updated successfully, but these errors were encountered: