Skip to content

Conversation

@zhongyu09
Copy link
Contributor

What changes were proposed in this pull request?

  1. replace executeCollectIterator() by executeCollectIteratorFuture() in SparkPlan.scala to run collect query in async way and return the future of collect result
  2. in BroadcastExchangeExec->relationFuture, call executeCollectIteratorFuture() in current thread and get the collectFuture, wait collectFuture in "broadcast-exchange" thread

Why are the changes needed?

#31269 gives a partial fix to SPARK-33933, which is not a perfect solution. This changes can make sure the broadcast collect job is submitted before shuffle map jobs. #31269 ensure the calling of materialize() of BroadcastQueryStage is before ShuffleQueryStage. In BroadcastQueryStage's materialize(), doPrepare() will call relationFuture, which will submit collect job before return the future.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add UT

@github-actions github-actions bot added the SQL label May 16, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title SPARK-35414: Submit broadcast collect job first to avoid broadcast timeout in AQE [SPARK-35414][SQL] Submit broadcast collect job first to avoid broadcast timeout in AQE May 17, 2021
@zhongyu09 zhongyu09 changed the title [SPARK-35414][SQL] Submit broadcast collect job first to avoid broadcast timeout in AQE [WIP][SPARK-35414][SQL] Submit broadcast collect job first to avoid broadcast timeout in AQE May 19, 2021
@weixiuli
Copy link
Contributor

Is there any progress for this pr ?

@zhongyu09
Copy link
Contributor Author

Is there any progress for this pr ?

In https://github.com/zhongyu09/spark/runs/2829717969, unit test for "BroadcastExchange should cancel the job group if timeout" fails. I realize that there could be race condition in this solution and cause hang when doing broadcast.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 15, 2022
@github-actions github-actions bot closed this Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants