Skip to content

Conversation

@jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up.

How was this patch tested?

Existing tests.

@dongjoon-hyun
Copy link
Member

Thank you so much, @jiangxb1987 .

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122856 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

"SPARK-31485: barrier stage should fail if only partial tasks are launched" This test case is flaky, @Ngone51 can you take a look?

@Ngone51
Copy link
Member

Ngone51 commented May 20, 2020

sure, thanks!

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122860 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122864 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member

Ngone51 commented May 20, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122862 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122865 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor

Also fix this test like:

  test("share messages with allGather() call") {
    val conf = new SparkConf()
      .setMaster("local-cluster[4, 1, 1024]")
      .setAppName("test-cluster")
    sc = new SparkContext(conf)
    val rdd = sc.makeRDD(1 to 4, 4)
    val rdd2 = rdd.barrier().mapPartitions { it =>
      val context = BarrierTaskContext.get()
      // Sleep for a random time before global sync.
      Thread.sleep(Random.nextInt(1000))
      // Pass partitionId message in
      val message: String = context.partitionId().toString
      val messages: Array[String] = context.allGather(message)
      Iterator.single(messages.toList)
    }
    // Take a sorted list of all the partitionId messages
    val messages_list = rdd2.collect()
    assert (messages_list.length === 4)
    for (messages <- messages_list) {
      assert (messages === (0 until 4).map(_.toString).toList)
    }
  }

@Ngone51
Copy link
Member

Ngone51 commented May 21, 2020

Opened a separate PR #28596 to fix the test @WeichenXu123

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

Since #28596 is merged, shall we restart to proceed this PR? Thanks.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123066 has finished for PR 28584 at commit b08a272.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member

Ngone51 commented May 26, 2020

ping @jiangxb1987 Could you please update this?

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123106 has finished for PR 28584 at commit 83a34c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member

Ngone51 commented May 26, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123110 has finished for PR 28584 at commit 83a34c2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member

Ngone51 commented May 26, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123126 has finished for PR 28584 at commit 83a34c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

Retest this please.

1 similar comment
@jiangxb1987
Copy link
Contributor Author

Retest this please.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123134 has finished for PR 28584 at commit 83a34c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

Retest this please.

@jiangxb1987
Copy link
Contributor Author

Let's trigger one more time to ensure the suite is no longer flaky.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123138 has finished for PR 28584 at commit 83a34c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor Author

Retest this please.

@SparkQA
Copy link

SparkQA commented May 27, 2020

Test build #123199 has finished for PR 28584 at commit 83a34c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jiangxb1987 added a commit that referenced this pull request May 27, 2020
### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Xingbo Jiang <[email protected]>
(cherry picked from commit efe7fd2)
Signed-off-by: Xingbo Jiang <[email protected]>
@jiangxb1987
Copy link
Contributor Author

Thanks, merged to master/3.0 !

jiangxb1987 added a commit to jiangxb1987/spark that referenced this pull request May 28, 2020
### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes apache#28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Xingbo Jiang <[email protected]>
@jiangxb1987
Copy link
Contributor Author

Opened #28658 for branch-3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants