Skip to content

Support correlated subqueries for DELETE #9447

Merged
findepi merged 4 commits intotrinodb:masterfrom
Praveen2112:praveen/011_2/delete_correlated_queries
Jul 29, 2022
Merged

Support correlated subqueries for DELETE #9447
findepi merged 4 commits intotrinodb:masterfrom
Praveen2112:praveen/011_2/delete_correlated_queries

Conversation

@Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Sep 30, 2021

Description

Support correlated subqueries for DELETE

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement for DELETE operations

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core Query Engine

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Support correlated subqueries for DELETE

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch 2 times, most recently from a6c538d to 86e81bd Compare October 8, 2021 11:22
@Praveen2112 Praveen2112 marked this pull request as ready for review October 8, 2021 11:32
@Praveen2112
Copy link
Member Author

@kasiafi AC

@Praveen2112 Praveen2112 requested a review from kasiafi October 12, 2021 09:47
@findepi findepi requested a review from sopel39 October 12, 2021 11:48
@findepi
Copy link
Member

findepi commented Nov 5, 2021

@Praveen2112 please rebase

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch 2 times, most recently from 7e1f08e to 1136816 Compare November 8, 2021 07:37
@takezoe
Copy link
Member

takezoe commented Jan 31, 2022

Hi, is there any progress on this? We are facing a similar case and would like to fix the issue if possible.

@Praveen2112
Copy link
Member Author

Sorry for the delay, will rebase and apply the comments by this week.

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch from 1136816 to 98f0e28 Compare May 2, 2022 15:09
@findepi
Copy link
Member

findepi commented May 5, 2022

@Praveen2112
Copy link
Member Author

@findepi Yes. I think we could implement it by re-wiriting only the probe side (for INNER and LEFT join) and leave build side part of the plan. Is my understanding is correct ? I have updated the PR based on this above mentioned approach.

@Praveen2112 Praveen2112 requested a review from findepi May 10, 2022 14:58
@findepi
Copy link
Member

findepi commented May 13, 2022

@Praveen2112 can you please check the build status?

@Praveen2112
Copy link
Member Author

@findepi The build is now green.

@findepi findepi force-pushed the praveen/011_2/delete_correlated_queries branch from c7b3865 to c726f4d Compare May 17, 2022 12:43
@findepi
Copy link
Member

findepi commented May 17, 2022

(squashed a fixup)

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Support correlated EXISTS subquery for DELETE"

Comment on lines 761 to 767
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why ReplicateSemiJoinInDelete/ReplicateJoinAndSemiJoinInDelete needs to be run before ReorderJoins.
I think ReplicateSemiJoinInDelete/ReplicateJoinAndSemiJoinInDelete would undo distirbution type picked by ReorderJoins. What am i missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ReorderJoins can flip the JoinNode - we would like to avoid it for the PlanNode between Delete and TableScan. So we run them before applying these optimizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good way to solve this issue. Optimizer rules need to be able to run regardless of which order they are applied (i.e., correctness of the plan should not depend on the order of rule application -- only query performance should be affected).

How is this PR different from what we did for #8286 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this should be:

if (isDeleteQuery) {
  verify(joinType == INNER || joinType == LEFT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about UPDATE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE can be applied in a follow up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this should be:

if (isDeleteQuery) {
  verify(joinType == INNER || joinType == LEFT

has this been resolved?
or, am i being wrong here?

@sopel39
Copy link
Member

sopel39 commented May 19, 2022

Which connectors support these? Should we add test cases to BaseConnectorTest?

@Praveen2112
Copy link
Member Author

Which connectors support these?

Hive currently supports this.

Should we add test cases to BaseConnectorTest?
We have added to BaseConnectorTest.

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch from c726f4d to da82491 Compare June 21, 2022 14:06
@Praveen2112
Copy link
Member Author

@findepi AC. Except migration of tests from BCT - which we might address in a different PR. WDYT ?

@Praveen2112 Praveen2112 requested a review from findepi June 22, 2022 05:46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this should be:

if (isDeleteQuery) {
  verify(joinType == INNER || joinType == LEFT

has this been resolved?
or, am i being wrong here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch from 7366f38 to 6d3fd8d Compare July 18, 2022 12:30
@Praveen2112
Copy link
Member Author

@findepi Thanks for the review. AC

@findepi
Copy link
Member

findepi commented Jul 20, 2022

@Praveen2112
Copy link
Member Author

I wasn't able to add the comment there. So will add it here

This would make the query to fail a bit early at ReplicateJoinAndSemiJoinOptimizer instead of BeginTableWriteOptimizer. ReplicateJoinAndSemiJoinOptimizer mostly deals with configuring the distribution type for JoinNode not sure if we need to do this verification here. Additionally it would be better if we track all the failures related to plan structure at one point.

@findepi WDYT ?

@findepi
Copy link
Member

findepi commented Jul 21, 2022

Oh, this is because io.trino.sql.planner.optimizations.ReplicateJoinAndSemiJoinInDelete.Rewriter#isDeleteQuery is a field, not a context variable, so it doesn't distinguish between "main delete path" and the subqueries, where it shouldn't apply optimizations.
Fine, let's keep as-is for now.

#9447 (comment) remains outstanding.

@findepi findepi requested a review from electrum July 21, 2022 07:13
@findepi
Copy link
Member

findepi commented Jul 21, 2022

Oh, this is because io.trino.sql.planner.optimizations.ReplicateJoinAndSemiJoinInDelete.Rewriter#isDeleteQuery is a field, not a context variable,

i take this back. ReplicateJoinAndSemiJoinInDelete doesn't recurse into right sides of a join or semi-join.
Where does non-INNER join come from?

@Praveen2112
Copy link
Member Author

ReplicateJoinAndSemiJoinInDelete doesn't recurse into right sides of a join or semi-join.

True.

Where does non-INNER join come from?

It comes when we translate correlated subqueries

@Praveen2112 Praveen2112 force-pushed the praveen/011_2/delete_correlated_queries branch from 6d3fd8d to c081884 Compare July 25, 2022 16:29
@Praveen2112
Copy link
Member Author

@martint Sorry for continuing it here as I’am unable to add comments in that there.

The main issue here is that - DELETE or UPDATE query depends heavily on the pipeline if we could remove that dependency then the optimizers can be independent of each other. What are the other directions we could take for solving this issue ?

This PR is kind of an enhancement to #8286, #8286 solves the issue which was bcoz of AssignUniqueIdNode which creates an additional (unnecessary) exchanges in the plan. This PR solves the issue specific join which creates additional exchange node bcoz of partitioning join node.

@findepi
Copy link
Member

findepi commented Jul 29, 2022

@findepi findepi merged commit f1d47c3 into trinodb:master Jul 29, 2022
@github-actions github-actions bot added this to the 392 milestone Jul 29, 2022
@colebow
Copy link
Member

colebow commented Jul 29, 2022

@Praveen2112 - could you propose a release note for this fix?
cc @findepi

@Praveen2112
Copy link
Member Author

I have updated in PR message.

@findepi
Copy link
Member

findepi commented Aug 1, 2022

re #9447 (comment)
and #9447 (comment)

(can't reply inline..)

Optimizer rules need to be able to run regardless of which order they are applied

That's an oversimplification. There are some dependencies. For example

// PushJoinIntoTableScan must run after ReorderJoins (and DetermineJoinDistributionType)

correctness of the plan should not depend on the order of rule application

Absolutely. I want to stress this out, this is paramount.

(BTW I was convinced this isn't the case here.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants