-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43999][SQL][CORE] Support force finish useless stage when AQE on #41755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan @ulysses-you FYI hmm...If I remember correctly, @ulysses-you submitted a pr with similar goals |
0bd39fe to
d5448ef
Compare
Thanks @LuciferYang , I think you are talk about #41536 . I think there are some different between two PR. @ulysses-you 's way will affect reuse exchange (I'm not sure this PR will or not, but according the CI, seem will not), this PR will force finish stage not cancel it. The behavior are different. The way of get which stage should be cancel or finish also different. By the way, please search |
|
The reason why I closed my pr is that we can not cancel a broadcast query stage because exchange reuse shares the broadcast future instance. It's ok to cancel a shuffle stage before executing final plan, but note that we can not cancel an intermediate shuffle stage to avoid breaking reuse shuffle exchange. |
Sorry about late response. I don't think the current solution will cancel the reuse stage, because this PR decides whether to cancel it by judging whether there is still a reference to a stage in the PhysicalPlan. No reference means that he will not be reused. |
|
The problem is that, we can not predicate which exchange will be reused easily since AQE is adaptive. What if we cancel a running exchange but later we need to run another exchange which is semantic equivalent ? |
I got your point. Thanks for explained! Before find a useable solution, I close this pr first. |
What changes were proposed in this pull request?
This PR bring force finish stage feature to AQE, it used when some stage will be useless when plan after reOptimize.
eg:
The left table data is emtpy, so the right table data will be useless, unnecessary continue to read right table.

It will save database/spark CPU and IO resource if right table is large.
The Spark UI display force finish like
Why are the changes needed?
add new force finish feature to AQE
Does this PR introduce any user-facing change?
No
How was this patch tested?
add new test