Skip to content

Conversation

@Hisoka-X
Copy link
Member

@Hisoka-X Hisoka-X commented Jun 27, 2023

What changes were proposed in this pull request?

This PR bring force finish stage feature to AQE, it used when some stage will be useless when plan after reOptimize.
eg:

SELECT * FROM emptyTestData t1 LEFT OUTER JOIN testData t2 ON t1.key = t2.key

The left table data is emtpy, so the right table data will be useless, unnecessary continue to read right table.
It will save database/spark CPU and IO resource if right table is large.
The Spark UI display force finish like
image

Why are the changes needed?

add new force finish feature to AQE

Does this PR introduce any user-facing change?

No

How was this patch tested?

add new test

@LuciferYang
Copy link
Contributor

LuciferYang commented Jun 27, 2023

cc @cloud-fan @ulysses-you FYI

hmm...If I remember correctly, @ulysses-you submitted a pr with similar goals

@github-actions github-actions bot removed the BUILD label Jun 27, 2023
@Hisoka-X Hisoka-X marked this pull request as draft June 27, 2023 06:53
@Hisoka-X Hisoka-X force-pushed the cancel_useless_stage branch from 0bd39fe to d5448ef Compare June 27, 2023 07:57
@Hisoka-X
Copy link
Member Author

Hisoka-X commented Jun 27, 2023

hmm...If I remember correctly, @ulysses-you submitted a pr with similar goals

Thanks @LuciferYang , I think you are talk about #41536 . I think there are some different between two PR. @ulysses-you 's way will affect reuse exchange (I'm not sure this PR will or not, but according the CI, seem will not), this PR will force finish stage not cancel it. The behavior are different. The way of get which stage should be cancel or finish also different.

By the way, please search Unnecessary Stage in https://github.com/Hisoka-X/spark/actions/runs/5387239085/jobs/9778311879 or https://github.com/Hisoka-X/spark/actions/runs/5387239085/jobs/9778311577 . Some query already benefited from this PR.

@Hisoka-X Hisoka-X marked this pull request as ready for review June 27, 2023 09:39
@ulysses-you
Copy link
Contributor

The reason why I closed my pr is that we can not cancel a broadcast query stage because exchange reuse shares the broadcast future instance. It's ok to cancel a shuffle stage before executing final plan, but note that we can not cancel an intermediate shuffle stage to avoid breaking reuse shuffle exchange.

@Hisoka-X
Copy link
Member Author

The reason why I closed my pr is that we can not cancel a broadcast query stage because exchange reuse shares the broadcast future instance. It's ok to cancel a shuffle stage before executing final plan, but note that we can not cancel an intermediate shuffle stage to avoid breaking reuse shuffle exchange.

Sorry about late response. I don't think the current solution will cancel the reuse stage, because this PR decides whether to cancel it by judging whether there is still a reference to a stage in the PhysicalPlan. No reference means that he will not be reused.

@ulysses-you
Copy link
Contributor

ulysses-you commented Jul 31, 2023

The problem is that, we can not predicate which exchange will be reused easily since AQE is adaptive. What if we cancel a running exchange but later we need to run another exchange which is semantic equivalent ?

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Aug 1, 2023

The problem is that, we can not predicate which exchange will be reused easily since AQE is adaptive. What if we cancel a running exchange but later we need to run another exchange which is semantic equivalent ?

I got your point. Thanks for explained! Before find a useable solution, I close this pr first.

@Hisoka-X Hisoka-X closed this Aug 1, 2023
@Hisoka-X Hisoka-X deleted the cancel_useless_stage branch August 1, 2023 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants