-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32332][SQL] Support columnar exchanges when AQE is enabled #29134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
| assert(found == 111) | ||
|
|
||
| val found = collectPlanSteps(df.queryExecution.executedPlan).sum | ||
| assert(found == 11121) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to comment what 11121 equals in terms of the execs - MyBroadcastExchangeExec, etc..
|
overall approach makes sense to me. It would be great to get some feedback from people working on AQE - @maryannxue @cloud-fan |
|
org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite is failing @andygrove please take a look |
|
Test build #125984 has finished for PR 29134 at commit
|
|
Test build #125999 has finished for PR 29134 at commit
|
| */ | ||
| abstract class ShuffleExchange extends Exchange { | ||
| def shuffleId: Int | ||
| def getNumMappers: Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get this through MapOutputTracker.
| * enabled. | ||
| */ | ||
| abstract class ShuffleExchange extends Exchange { | ||
| def shuffleId: Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is available in mapOutputStats
| val optimizedPlan = applyPhysicalRules(e.child, queryStageOptimizerRules) | ||
| // apply optimizer rules to the Exchange node and its children, allowing plugins to be | ||
| // able to replace the Exchange node itself | ||
| val optimizedPlan = applyPhysicalRules(e, queryStageOptimizerRules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we guarantee the top node is still an Exchange after applying physical rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safer to use s.withNewChildren(optimizedChildPlan) than adding special handling in those physical rules: https://github.com/apache/spark/pull/29134/files#diff-a30c7a6fcdcdd13e57135fd04d05f3b7R115-R117
That saves you the trouble of worrying about certain assumptions being broken in an arbitrary rule.
| val leftParts = if (isLeftSkew && !isLeftCoalesced) { | ||
| val reducerId = leftPartSpec.startReducerIndex | ||
| val skewSpecs = createSkewPartitionSpecs( | ||
| left.shuffleStage.shuffle.shuffleDependency.shuffleId, reducerId, leftTargetSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get shuffleId from mapStats.
|
Hi @andygrove and @tgravescs , actually we have done something similar in our fork, to support columnar exchange. Our version supports broadcast as well, and also works with AQE disabled. Do you mind if I push our version out and ask you to review it? Thanks in advance! |
|
@cloud-fan that sounds good, push it out and we'll take a look. |
|
@cloud-fan do you know when you might have PR up? |
|
Replaced by #29262 |
### What changes were proposed in this pull request? This PR adds abstract classes for shuffle and broadcast, so that users can provide their columnar implementations. This PR updates several places to use the abstract exchange classes, and also update `AdaptiveSparkPlanExec` so that the columnar rules can see exchange nodes. This is an alternative of #29134 . Close #29134 ### Why are the changes needed? To allow columnar exchanges. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests Closes #29262 from cloud-fan/columnar. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
What changes were proposed in this pull request?
This PR adds support for plugins to be able to provide columnar exchanges when AQE is enabled.
The main changes are:
Why are the changes needed?
The changes are needed so that plugins can provide columnar exchage operators when AQE is enabled.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit tests have been updated as part of this PR to test replacing broadcast and shuffle exchanges when AQE is enabled.