Skip to content

Enhance join reordering to work with non-simple equi-join clauses#21153

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
aaneja:fixBadEquiJoinFilter_v2_PR
Oct 30, 2023
Merged

Enhance join reordering to work with non-simple equi-join clauses#21153
tdcmeehan merged 1 commit intoprestodb:masterfrom
aaneja:fixBadEquiJoinFilter_v2_PR

Conversation

@aaneja
Copy link
Contributor

@aaneja aaneja commented Oct 16, 2023

Description

Enhance join-reordering to work with non-simple equi-join predicates

Motivation and Context

Reapplies #20413 with fixes to bugs identified -

  1. The output variables of JoinNode could be referring to a variable generated from an intermediate Project. The output variables of the CanonicalJoinNode were not being updated to reflect this correctly
  2. Intermediate assignments tracked while flattening the ProjectNodes should be recursively falttened so that the CanonicalJoinNode only refers to input/output variables from the actual join sources

Additionally, the feature config for optimizer.handle-complex-equi-joins was changed to a default false to allow testing this in isolation/rollout. We will set this to default true in the future, once we test with more workloads

Test Plan

New tests added to address the bug identified -

  1. testProjectNodesBetweenJoinNodesAreFlattenedForComplexEquiJoins was rewritten to include multiple ProjectNode's (including nested Projects and a simple rename projection). Additionally, further assertions were added on the overall assignments built and the output variables that were rewritten
  2. testComplexEquiJoinCriteriaForJoinsWithUSINGClause was added which runs a SELECT * query with a USING clause

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Enhance join-reordering to work with non-simple equi-join predicates

@aaneja aaneja force-pushed the fixBadEquiJoinFilter_v2_PR branch 2 times, most recently from 1c78aa8 to f000340 Compare October 16, 2023 12:08
@aaneja aaneja marked this pull request as ready for review October 16, 2023 14:12
@aaneja aaneja requested a review from a team as a code owner October 16, 2023 14:12
@aaneja aaneja requested a review from presto-oss October 16, 2023 14:12
@aaneja aaneja changed the title [DRAFT] Enhance join reordering to work with non-simple equi-join clauses Enhance join reordering to work with non-simple equi-join clauses Oct 16, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I noticed that the cherry-picked code had this added to ensure that we always run this test with no join ordering. This was missed when we pulled it into Presto.

@feilong-liu
Copy link
Contributor

Can you highlight the change you added compared with previous PR (maybe by adding PR comments for the code) to help review?

Copy link
Contributor Author

@aaneja aaneja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu Here are the delta of the changes from the previous PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the changes from the previous PR are to this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recursively resolve all intermediate assignments to only use the input variables of the join sources, e.g:

inter_3 = inter_2
inter_2 = inter_1 + join_op_1
inter_1 = join_op_2 + join_op_0

is rewritten to :

inter_3 = join_op_2 + join_op_0 + join_op_1
inter_2 = join_op_2 + join_op_0 + join_op_1
inter_1 = join_op_2 + join_op_0

Copy link
Contributor Author

@aaneja aaneja Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While building the MultiJoinNode, we take care to set the output variables of the MultiJoinNode to only refer to the input variables.
Any output variables that were using variables from the intermediate project nodes are tracked in overallAssigments
More details are in code comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the test to demonstrate how multiple nested Project's are flattened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamePlusSum is a variable from an intermediate Project - this gets set on the overall assignment of the MultiJoinNode with the resolved variables of the input sources

@aaneja aaneja force-pushed the fixBadEquiJoinFilter_v2_PR branch from f000340 to cbef94e Compare October 27, 2023 11:13
@aaneja aaneja removed the request for review from kaikalur October 30, 2023 04:04
Join predicates like `left.key = right1.key1 + right2.key2` can reduce
the join space by appearing as Project nodes or Join noes with no
equi-join clauses in the join graph. This commit fixes this behavior
by removing any intermediate Projects in the join graph and only
creating them on-the-fly while choosing the join order
@aaneja aaneja force-pushed the fixBadEquiJoinFilter_v2_PR branch from cbef94e to 06dbf27 Compare October 30, 2023 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants