Skip to content

Remove redundant sort columns#21371

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
aaneja:removeRedundantSortColumns
Jan 5, 2024
Merged

Remove redundant sort columns#21371
tdcmeehan merged 1 commit intoprestodb:masterfrom
aaneja:removeRedundantSortColumns

Conversation

@aaneja
Copy link
Contributor

@aaneja aaneja commented Nov 13, 2023

Description

Use the logical properties from constraint framework to remove redundant sort columns

Motivation and Context

Queries can have GROUP + ORDER BY columns that are redundant. Example from TPCDS Q43 :

 GROUP BY s_store_name, s_store_id
 ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales

Impact

When we run with the constraints framework switched on (SET SESSION exploit_constraints=true), we see redundant columns removed, see examples below -

TPCDS Q43 :

Query has -

 GROUP BY s_store_name, s_store_id
 ORDER BY s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales

Orig Plan (relevant portion) -

 - TopNPartial[PlanNodeId 605][100 by (s_store_name ASC_NULLS_LAST, s_store_id ASC_NULLS_LAST, sum ASC_NULLS_LAST, sum_14 ASC_NULLS_LAST, sum_15 ASC_NULLS_LAST, sum_16 ASC_NULLS_LAST, sum_17 ASC_NULLS_LAST, sum_18 ASC_NULLS_LAST, sum_19 ASC_NULLS_LAST)] => [s_stor>
                     - Project[PlanNodeId 913][projectLocality = LOCAL] => [s_store_name:varchar(50), s_store_id:char(16), sum_18:decimal(38,2), sum_17:decimal(38,2), sum_16:decimal(38,2), sum_15:decimal(38,2), sum_19:decimal(38,2), sum_14:decimal(38,2), sum:decimal(38,2)]       >
                         - Aggregate(FINAL)[s_store_name, s_store_id][$hashvalue][PlanNodeId 8] => [s_store_name:varchar(50), s_store_id:char(16), $hashvalue:bigint, sum_18:decimal(38,2), sum_17:decimal(38,2), sum_16:decimal(38,2), sum_15:decimal(38,2), sum_19:decimal(38,2), sum_>
                                 sum_18 := "presto.default.sum"((sum_56)) (2:451)                                                                                                                                                                                                       >
                                 sum_17 := "presto.default.sum"((sum_55)) (2:367)                                                                                                                                                                                                       >
                                 sum_16 := "presto.default.sum"((sum_54)) (2:282)                                                                                                                                                                                                       >
                                 sum_15 := "presto.default.sum"((sum_53)) (2:199)                                                                                                                                                                                                       >
                                 sum_19 := "presto.default.sum"((sum_57)) (2:533)                                                                                                                                                                                                       >
                                 sum_14 := "presto.default.sum"((sum_52)) (2:117)                                                                                                                                                                                                       >
                                 sum := "presto.default.sum"((sum_51)) (2:35)      

Orderings are : (s_store_name ASC_NULLS_LAST, s_store_id ASC_NULLS_LAST, sum ASC_NULLS_LAST, sum_14 ASC_NULLS_LAST, sum_15 ASC_NULLS_LAST, sum_16 ASC_NULLS_LAST, sum_17 ASC_NULLS_LAST, sum_18 ASC_NULLS_LAST, sum_19 ASC_NULLS_LAST)

With this rewrite, we see that the plan changes to :

 - TopNPartial[PlanNodeId 606][100 by (s_store_name ASC_NULLS_LAST, s_store_id ASC_NULLS_LAST)] => [s_store_name:varchar(50), s_store_id:char(16), sum_18:decimal(38,2), sum_17:decimal(38,2), sum_16:decimal(38,2), sum_15:decimal(38,2), sum_19:decimal(38,2), sum_14:>
                     - Project[PlanNodeId 914][projectLocality = LOCAL] => [s_store_name:varchar(50), s_store_id:char(16), sum_18:decimal(38,2), sum_17:decimal(38,2), sum_16:decimal(38,2), sum_15:decimal(38,2), sum_19:decimal(38,2), sum_14:decimal(38,2), sum:decimal(38,2)]       >
                         - Aggregate(FINAL)[s_store_name, s_store_id][$hashvalue][PlanNodeId 8] => [s_store_name:varchar(50), s_store_id:char(16), $hashvalue:bigint, sum_18:decimal(38,2), sum_17:decimal(38,2), sum_16:decimal(38,2), sum_15:decimal(38,2), sum_19:decimal(38,2), sum_>
                                 sum_18 := "presto.default.sum"((sum_56)) (8:9)                                                                                                                                                                                                         >
                                 sum_17 := "presto.default.sum"((sum_55)) (7:9)                                                                                                                                                                                                         >
                                 sum_16 := "presto.default.sum"((sum_54)) (6:9)                                                                                                                                                                                                         >
                                 sum_15 := "presto.default.sum"((sum_53)) (5:9)                                                                                                                                                                                                         >
                                 sum_19 := "presto.default.sum"((sum_57)) (9:9)                                                                                                                                                                                                         >
                                 sum_14 := "presto.default.sum"((sum_52)) (4:9)                                                                                                                                                                                                         >
                                 sum := "presto.default.sum"((sum_51)) (3:9)            

Orderings are : (s_store_name ASC_NULLS_LAST, s_store_id ASC_NULLS_LAST)

TPCDS Q3 :

Query has :

 GROUP BY dt.d_year, item.i_brand, item.i_brand_id
 ORDER BY dt.d_year, sum_agg desc, brand_id

No redundant columns here, no impact on sorting observed

2023-11-13T14:51:00.735+0530	DEBUG	Query-20231113_092100_00005_hyt9h-301	com.facebook.presto.sql.planner.iterative.rule.RemoveRedundantTopNColumns	[11] TopNNode : com.facebook.presto.spi.plan.TopNNode@4c85148
2023-11-13T14:51:00.735+0530	DEBUG	Query-20231113_092100_00005_hyt9h-301	com.facebook.presto.sql.planner.iterative.rule.RemoveRedundantSortColumns	Current Node order variables: [d_year, i_brand_id, sum]
Logical properties for source [379] : LogicalPropertiesImpl{KeyProperty=KeyProperty{keys=Key{variables=d_year,i_brand_id,i_brand}}, EquivalenceClassProperty=EquivalenceClassProperty{EquivalenceClassHeads=, EquivalenceClasses=}, MaxCardProperty=MaxCardProperty{value=null}}
2023-11-13T14:51:00.735+0530	DEBUG	Query-20231113_092100_00005_hyt9h-301	com.facebook.presto.sql.planner.iterative.rule.RemoveRedundantSortColumns	No key variables found

Testing

  • New unit test and presto-test added
  • Changes orderings and plans for below TPCDS queries
Q43
Q54
Q60
Q83
Q78
Q58

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Remove redundant sort columns from Plan nodes if a unique constraint can be identified for a prefix of the ordering list

@aaneja aaneja force-pushed the removeRedundantSortColumns branch from 9daeaa6 to 4eef112 Compare December 12, 2023 16:19
@aaneja aaneja marked this pull request as ready for review December 12, 2023 16:23
@aaneja aaneja requested a review from a team as a code owner December 12, 2023 16:23
@aaneja aaneja requested a review from presto-oss December 12, 2023 16:23
@aaneja aaneja force-pushed the removeRedundantSortColumns branch from 4eef112 to 0dfd436 Compare December 12, 2023 16:30
@ZacBlanco ZacBlanco self-requested a review December 12, 2023 19:17
@aaneja aaneja force-pushed the removeRedundantSortColumns branch from 0dfd436 to 3f316b4 Compare December 13, 2023 04:46
@aaneja aaneja force-pushed the removeRedundantSortColumns branch from 3f316b4 to 1b670d4 Compare December 20, 2023 10:15
Copy link
Contributor

@feilong-liu feilong-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TODO] : Add AbstractTest to show correctness is not impacted

Why is it a TODO but not part of this PR?

If logical properties from the constraint framework tell us
that a unique constraint exists for a prefix of the OrderBy clause
we can drop the extra un-needed sort columns from this clause
@aaneja aaneja force-pushed the removeRedundantSortColumns branch from 1b670d4 to 537c038 Compare December 21, 2023 06:36
@tdcmeehan tdcmeehan merged commit 55113a3 into prestodb:master Jan 5, 2024
@aaneja aaneja deleted the removeRedundantSortColumns branch January 10, 2024 03:40
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants