-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE #31167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,7 +23,8 @@ import java.util.concurrent.LinkedBlockingQueue | |
| import scala.collection.JavaConverters._ | ||
| import scala.collection.concurrent.TrieMap | ||
| import scala.collection.mutable | ||
| import scala.concurrent.ExecutionContext | ||
| import scala.concurrent.{ExecutionContext, Future} | ||
| import scala.concurrent.duration.Duration | ||
| import scala.util.control.NonFatal | ||
|
|
||
| import org.apache.spark.SparkException | ||
|
|
@@ -190,7 +191,36 @@ case class AdaptiveSparkPlanExec( | |
| executionId.foreach(onUpdatePlan(_, result.newStages.map(_.plan))) | ||
|
|
||
| // Start materialization of all new stages and fail fast if any stages failed eagerly | ||
| result.newStages.foreach { stage => | ||
|
|
||
| // SPARK-33933: we should materialize broadcast stages first and wait the | ||
| // materialization finish before materialize other stages, to avoid waiting | ||
| // for broadcast tasks to be scheduled and leading to broadcast timeout. | ||
| val broadcastMaterializationFutures = result.newStages | ||
| .filter(_.isInstanceOf[BroadcastQueryStageExec]) | ||
| .map { stage => | ||
| var future: Future[Any] = null | ||
| try { | ||
| future = stage.materialize() | ||
| future.onComplete { res => | ||
| if (res.isSuccess) { | ||
| events.offer(StageSuccess(stage, res.get)) | ||
| } else { | ||
| events.offer(StageFailure(stage, res.failed.get)) | ||
| } | ||
| }(AdaptiveSparkPlanExec.executionContext) | ||
| } catch { | ||
| case e: Throwable => | ||
| cleanUpAndThrowException(Seq(e), Some(stage.id)) | ||
| } | ||
| future | ||
| } | ||
|
|
||
| // Wait for the materialization of all broadcast stages finish | ||
viirya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| broadcastMaterializationFutures.foreach(ThreadUtils.awaitReady(_, Duration.Inf)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it necessary to wait until all
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In deed, there will be a little waste of resources. This is the same behavior as non-AQE. Given the lightweight of broadcast, it should not cause too much time, few seconds in normal. I think that's acceptable.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zhongyu09 It might be better to give a benchmark to compare the performance difference between before and after
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, do we have some benchmark testing framework?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Micro benchmark can base on
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with the partial fix like #30998. I wonder is it too heavy to add new event just for UT?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can also log the stage submission and then write test to verify the log.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an idea. I will have a look for how to do this. Do we have any UT to verify the log?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea a lot, e.g.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| // Start materialization of non-broadcast stages | ||
| result.newStages.filter(!_.isInstanceOf[BroadcastQueryStageExec]) | ||
| .foreach { stage => | ||
| try { | ||
| stage.materialize().onComplete { res => | ||
| if (res.isSuccess) { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent: line 201 ~216
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure line 201 ~ 215 should have 2 more space indent. Just behavior same as line 225~ 236 (old code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be :)