[SPARK-33933][SQL][3.0][test-maven] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE #31084

zhongyu09 · 2021-01-07T14:48:26Z

This PR is the same as #30998 to merge to branch 3.0

What changes were proposed in this pull request?

In AdaptiveSparkPlanExec.getFinalPhysicalPlan, when newStages are generated, sort the new stages by class type to make sure BroadcastQueryState precede others.
It can make sure the broadcast job are submitted before map jobs to avoid waiting for job schedule and cause broadcast timeout.

Why are the changes needed?

When enable AQE, in getFinalPhysicalPlan, spark traversal the physical plan bottom up and create query stage for materialized part by createQueryStages and materialize those new created query stages to submit map stages or broadcasting. When ShuffleQueryStage are materializing before BroadcastQueryStage, the map job and broadcast job are submitted almost at the same time, but map job will hold all the computing resources. If the map job runs slow (when lots of data needs to process and the resource is limited), the broadcast job cannot be started(and finished) before spark.sql.broadcastTimeout, thus cause whole job failed (introduced in SPARK-31475).
The workaround to increase spark.sql.broadcastTimeout doesn't make sense and graceful, because the data to broadcast is very small.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Add UT
Test the code using dev environment in https://issues.apache.org/jira/browse/SPARK-33933

dongjoon-hyun · 2021-01-08T00:20:32Z

The failure is relevant to AQE. Could you take a look at it, @zhongyu09 ?

zhongyu09 · 2021-01-08T02:14:56Z

The failure is relevant to AQE. Could you take a look at it, @zhongyu09 ?

I will have a look.

dongjoon-hyun · 2021-01-08T02:22:09Z

Thanks!

cloud-fan · 2021-01-08T05:02:45Z

ok to test

SparkQA · 2021-01-08T05:54:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38409/

SparkQA · 2021-01-08T06:15:43Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38409/

zhongyu09 · 2021-01-08T07:30:14Z

Hi @dongjoon-hyun
I try the test tens of times and the test failed twice.
As we discussed with @viirya and @cloud-fan in #30998, the solution is not perfect. The order of calling materialize can guarantee that the order of task to be scheduled in normal circumstances, but, the guarantee is not strict since the submit of broadcast job and shuffle map job are in different thread. So there's still risk for the shuffle map job schedule earlier before broadcast job. I wonder should we need to remove the UT until we thorough resolve the issue.

SparkQA · 2021-01-08T09:26:24Z

Test build #133820 has finished for PR 31084 at commit 03c9b09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2021-01-08T20:50:07Z

Got it. I'm not sure about removing UTs. BTW, Jenkins passed at least.

WDYT, @cloud-fan and @HyukjinKwon ?

dongjoon-hyun · 2021-01-08T20:51:01Z

Retest this please

SparkQA · 2021-01-08T22:01:56Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38446/

SparkQA · 2021-01-08T22:23:36Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38446/

SparkQA · 2021-01-09T00:18:57Z

Test build #133857 has finished for PR 31084 at commit 03c9b09.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2021-01-09T04:54:42Z

The same test case failed in Jenkins Maven environment again. (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133857/testReport)

- SPARK-33933: AQE broadcast should not timeout with slow map tasks *** FAILED ***
  1693 was not greater than 2000 (AdaptiveQueryExecSuite.scala:926)
14:59:38.190 WARN org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite:

dongjoon-hyun · 2021-01-09T04:55:38Z

I'm wondering if this test is stable in master and branch-3.1.

zhongyu09 · 2021-01-09T05:40:41Z

I'm wondering if this test is stable in master and branch-3.1.

Yes, also not stable in master. I create a PR to remove the UT. #31099

HyukjinKwon · 2021-01-10T02:49:46Z

A partial fix is fine but let's make sure to mention/document what cases the partial fix does not cover.

HyukjinKwon · 2021-01-10T04:56:39Z

I reverted it for now for RC preparation . Let's make a PR with clarifying which case it doesn't cover, and why this is a partial fix.

zhongyu09 · 2021-01-11T14:43:24Z

I reverted it for now for RC preparation . Let's make a PR with clarifying which case it doesn't cover, and why this is a partial fix.

For partial fix, it is difficult to give an stable UT. I would rather give an stable fix. I think two directions:

make sure broadcast job is submitted before shuffle map job, the calling of materialize() for non-broadcast query stage should wait until all the broadcast jobs are submitted.
excluded the schedule time for broadcast job when we calculate time out. This is very hard to measure. For downgrade, perhaps we can measure the time for pure broadcast, that is, minus collect time. But this also has big changes, as well as changes for non-AQE.

I prefer for #1, it behavior more like non-AQE and is this PR's original intention and will have less impact to non-AQE.

Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE

03c9b09

zhongyu09 mentioned this pull request Jan 7, 2021

[SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE #30998

Closed

cloud-fan approved these changes Jan 7, 2021

View reviewed changes

yaooqinn approved these changes Jan 8, 2021

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-33933][SQL][3.0] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE~~ [SPARK-33933][SQL][3.0][test-maven] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE Jan 8, 2021

HyukjinKwon closed this Jan 10, 2021

[SPARK-33933][SQL][3.0][test-maven] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE #31084

[SPARK-33933][SQL][3.0][test-maven] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE #31084

Uh oh!

Conversation

zhongyu09 commented Jan 7, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jan 8, 2021

Uh oh!

zhongyu09 commented Jan 8, 2021

Uh oh!

dongjoon-hyun commented Jan 8, 2021

Uh oh!

cloud-fan commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

zhongyu09 commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

dongjoon-hyun commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 8, 2021

Uh oh!

SparkQA commented Jan 9, 2021

Uh oh!

dongjoon-hyun commented Jan 9, 2021

Uh oh!

dongjoon-hyun commented Jan 9, 2021

Uh oh!

zhongyu09 commented Jan 9, 2021

Uh oh!

HyukjinKwon commented Jan 10, 2021

Uh oh!

HyukjinKwon commented Jan 10, 2021

Uh oh!

zhongyu09 commented Jan 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dongjoon-hyun commented Jan 8, 2021 •

edited

Loading