Skip to content

[SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync#28462

Closed
Ngone51 wants to merge 4 commits intoapache:masterfrom
Ngone51:impr_barrier_req
Closed

[SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync#28462
Ngone51 wants to merge 4 commits intoapache:masterfrom
Ngone51:impr_barrier_req

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented May 6, 2020

What changes were proposed in this pull request?

This PR improves handling the case where different barrier sync types in a single sync:

  • use clear instead of cleanupBarrierStage

  • make sure all requesters are failed because of "different barrier sync types"

Why are the changes needed?

Currently, we use cleanupBarrierStage to clean up a barrier stage when we detecting the case of "different barrier sync types". But this leads to a problem that we could create new a ContextBarrierState for the same stage again if there're on-way requests from tasks. As a result, those task will fail because of killing instead of "different barrier sync types".

Besides, we don't handle the current request which is being handling properly as it will fail due to epoch mismatch instead of "different barrier sync types".

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Updated a existed test.

@Ngone51
Copy link
Member Author

Ngone51 commented May 6, 2020

cc @jiangxb1987 @sarthfrey

requestMethods.add(curReqMethod)
if (requestMethods.size > 1) {
val error = new SparkException(s"Different barrier sync types found for the " +
s"sync $barrierId: ${requestMethods.mkString(", ")}. Please use the " +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally ignored barrier epoch here because it (-1) becomes meaningless after clear. But we can update the logic of clear if any of you think it's necessary here.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122354 has finished for PR 28462 at commit 463a438.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122362 has finished for PR 28462 at commit 9f36a2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51 Ngone51 changed the title [SPARK-31651[CORE]] Improve handling the case where different barrier sync types in a single sync [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync May 7, 2020
@@ -108,7 +108,7 @@ private[spark] class BarrierCoordinator(

// The request method which is called inside this barrier sync. All tasks should make sure
Copy link
Contributor

@dilipbiswal dilipbiswal May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngone51 Super Nit: Perhaps change the comment here to reflect the change ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks.

@SparkQA
Copy link

SparkQA commented May 7, 2020

Test build #122404 has finished for PR 28462 at commit 8b192f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Ngone51
Copy link
Member Author

Ngone51 commented May 13, 2020

ping @jiangxb1987 @dilipbiswal

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented May 19, 2020

Test build #122823 has finished for PR 28462 at commit 8b192f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jiangxb1987 pushed a commit that referenced this pull request May 19, 2020
… sync types in a single sync

### What changes were proposed in this pull request?

This PR improves handling the case where different barrier sync types in a single sync:

- use `clear` instead of `cleanupBarrierStage `

- make sure all requesters are failed because of "different barrier sync types"

### Why are the changes needed?

Currently, we use `cleanupBarrierStage` to clean up a barrier stage when we detecting the case of "different barrier sync types". But this leads to a problem that we could create new a `ContextBarrierState` for the same stage again if there're on-way requests from tasks. As a result, those task will fail because of killing instead of "different barrier sync types".

Besides, we don't handle the current request which is being handling properly as it will fail due to epoch mismatch instead of "different barrier sync types".

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated a existed test.

Closes #28462 from Ngone51/impr_barrier_req.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
(cherry picked from commit 653ca19)
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
@jiangxb1987
Copy link
Contributor

Thanks, merged to master/3.0 !

@Ngone51
Copy link
Member Author

Ngone51 commented May 19, 2020

thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants