[SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group #27665

xuanyuanking · 2020-02-21T08:54:31Z

What changes were proposed in this pull request?

Fix the regression caused by #22173.
The original PR changes the logic of handling ChunkFetchReqeust from async to sync, that's causes the shuffle benchmark regression. This PR fixes the regression back to the async mode by reusing the config spark.shuffle.server.chunkFetchHandlerThreadsPercent.
When the user sets the config, ChunkFetchReqeust will be processed in a separate event loop group, otherwise, the code path is exactly the same as before.

As the creation of the separate event loop group is disabled by default, this PR also is a kind of revert for SPARK-25641.

Why are the changes needed?

Fix the shuffle performance regression described in #22173 (comment)

Does this PR introduce any user-facing change?

Yes, this PR disable the separate event loop for FetchRequest by default.

How was this patch tested?

Existing UT.

SparkQA · 2020-02-21T12:40:41Z

Test build #118772 has finished for PR 27665 at commit 743e566.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

otterc · 2020-02-21T19:26:00Z

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java

When we remove await, we see more SASL requests timing out. I mentioned this in the comment here:
#22173 (comment)

Wouldn't making the asyncMode default, make this problem worse?
In aysnc mode, how do you plan to tackle increased number of SASL failures?

Thanks for the comment @otterc!
This PR aims to fix the performance regression brings by the sync mode fetching. As you mention, you find a problem with the stress test framework, I think the default mode should be the guarantee for common use cases.

For the SASL requests timing out, maybe we need more context for your internal stress test framework and analyze the root cause. IMO, there should be other configs in spark.shuffle/netty could help, not only depends on the async/sync mode here.

xuanyuanking · 2020-02-24T04:17:00Z

cc @cloud-fan @tgravescs

tgravescs · 2020-02-24T19:02:06Z

So I actually filed SPARK-30623 for this, please update the title and things. I don't think we need a separate feature config for this, as that jira mentioned I think was just say if the config isn't explicitly set then do synchronous mode.

Victsm · 2020-02-24T20:04:22Z

@xuanyuanking we worked on the original fix of this issue. Having await in there is the key to the benefits provided in SPARK-24355, which improves reliability of Spark shuffle in a reasonable scaled deployment. This issue seems common across companies like us (LinkedIn), Netflix, Uber, Yahoo. As mentioned in #22173, what we observed is that in cases where HDD is used for shuffle storage, the disk is saturated first before the network can be saturated. So, for a reasonable scaled deployment, having this fix provides a boost in shuffle reliability without hurting much on the performance side. This is also validated by @tgravescs in the Yahoo deployment of this patch.

It's reasonable to introduce another config that disables this reliability improvement if it leads to performance regression in certain deployment mode. Just want to see whether we should leave this enabled by default or not. Also, as mentioned in #22173, we have discovered a potential fix to this perf regression issue that does not removes its reliability benefits. It will take some extra time on our side to evaluate that fix, which is a fix inside Netty. Want to make sure the broader community knows what we have been doing for this issue, so we do not take away a potential reliability improvement to Spark.

cloud-fan · 2020-02-25T11:16:15Z

The default value of spark.shuffle.server.chunkFetchHandlerThreadsPercent is 100, so this feature is expected to be disabled by default IIUC.

Policy-wise, we should not fix an issue while introducing a regression. I'm OK to have it since it's disabled by default and it does fix a common issue. What I'm asking for is to make sure the code path is exactly the same as before when this feature is disabled, so that there is no regression.

+1 with @tgravescs to reuse the existing config.

xuanyuanking · 2020-03-23T11:05:32Z

Do some refactoring to reuse the logic of processing fetch requests. Reuse the config spark.shuffle.server.chunkFetchHandlerThreadsPercent to control the separate event loop creation. Please take a look.
cc @otterc @Victsm @tgravescs @cloud-fan

cloud-fan · 2020-03-23T12:10:03Z

common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java

+   */
+  public boolean separateChunkFetchRequest() {
+    try {
+      conf.get("spark.shuffle.server.chunkFetchHandlerThreadsPercent");


there is no API to check conf existence?

Yes, no API in ConfigProvider.

how about conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 0) > 0?

Thanks, done in 4f42083.

cloud-fan · 2020-03-23T12:14:33Z

common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java

      return 0;
    }
    int chunkFetchHandlerThreadsPercent =
-      conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 100);


what's wrong with the previous code?

No need to give a default value here, when it comes to here, the config must be set.

What do you mean by the config must be set, @xuanyuanking ? What value do you expect by default? Apparently, this seems to revert SPARK-25641 together without mentioning SPARK-25641. In the PR, only SPARK-24355 is mentioned.

No need to give a default value here, when it comes to here, the config must be set.

because we only call this method if separateChunkFetchRequest returns true.

We will see exception if the assumption is broken.

In this PR, we make the separate event loop group configurable by checking the config spark.shuffle.server.chunkFetchHandlerThreadsPercent is set or not.

What do you mean by the config must be set

Here the function chunkFetchHandlerThreads is only called while the config is set.

spark/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java

Lines 124 to 129 in 0fe203e

if (conf.getModuleName() != null &&

conf.getModuleName().equalsIgnoreCase("shuffle") &&

!isClientOnly && conf.separateChunkFetchRequest()) {

chunkFetchWorkers = NettyUtils.createEventLoop(

IOMode.valueOf(conf.ioMode()),

conf.chunkFetchHandlerThreads(),

What value do you expect by default? Apparently, this seems to revert SPARK-25641 together without mentioning SPARK-25641.

Yes, this PR makes the feature disabled by default, let me also mention SPARK-25641 in PR description.

Thank you, @xuanyuanking and @cloud-fan .

cloud-fan · 2020-03-23T12:15:30Z

...on/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java

  private final TransportRequestHandler requestHandler;
  private final long requestTimeoutNs;
  private final boolean closeIdleConnections;
+  private final boolean separateChunkFetchRequest;


maybe a more explicit name: skipChunkFetchRequest

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java

cloud-fan · 2020-03-23T12:19:30Z

...on/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java

      TransportClient reverseClient,
      RpcHandler rpcHandler,
      Long maxChunksBeingTransferred) {
+    super(reverseClient, rpcHandler.getStreamManager(), maxChunksBeingTransferred);


do we still need the maxChunksBeingTransferred variable in this class?

Yes, it is still in use in processStreamRequest.

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java

SparkQA · 2020-03-23T14:01:31Z

Test build #120190 has finished for PR 27665 at commit 5015f60.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-23T14:07:05Z

Test build #120192 has finished for PR 27665 at commit 010fbec.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-23T14:26:55Z

Test build #120196 has finished for PR 27665 at commit e655227.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-24T04:10:19Z

Test build #120229 has finished for PR 27665 at commit da8abd3.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-24T12:05:16Z

Test build #120274 has finished for PR 27665 at commit 34b52ef.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-24T12:07:04Z

common/network-common/src/test/java/org/apache/spark/network/TransportRequestHandlerSuite.java


  @Test
-  public void handleStreamRequest() {
+  public void handleStreamRequest() throws Exception {


is this change necessary?

Yes, because of the changes for TransportRequestHandler.handler here: https://github.com/apache/spark/pull/27665/files#diff-0e3429029d3f8d49e94ef11e4e3051a2R105

SparkQA · 2020-03-24T12:10:37Z

Test build #120273 has finished for PR 27665 at commit d4e0352.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-25T05:32:12Z

retest this please

SparkQA · 2020-03-25T07:05:01Z

Test build #120311 has finished for PR 27665 at commit 4f42083.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-25T07:05:02Z

Test build #120310 has finished for PR 27665 at commit 34b52ef.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2020-03-25T07:56:20Z

retest this please

SparkQA · 2020-03-25T12:24:29Z

Test build #120326 has finished for PR 27665 at commit 4f42083.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-26T03:59:28Z

thanks, merging to master/3.0!

dongjoon-hyun · 2020-03-26T04:49:18Z

Hi, @cloud-fan . This seems to be not in branch-3.0 yet.

…event loop group ### What changes were proposed in this pull request? Fix the regression caused by #22173. The original PR changes the logic of handling `ChunkFetchReqeust` from async to sync, that's causes the shuffle benchmark regression. This PR fixes the regression back to the async mode by reusing the config `spark.shuffle.server.chunkFetchHandlerThreadsPercent`. When the user sets the config, ChunkFetchReqeust will be processed in a separate event loop group, otherwise, the code path is exactly the same as before. ### Why are the changes needed? Fix the shuffle performance regression described in #22173 (comment) ### Does this PR introduce any user-facing change? Yes, this PR disable the separate event loop for FetchRequest by default. ### How was this patch tested? Existing UT. Closes #27665 from xuanyuanking/SPARK-24355-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 0fe203e) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2020-03-26T04:54:30Z

it is now. merge script runs slowly at my sides...

dongjoon-hyun · 2020-03-26T04:56:46Z

Thank you!

dongjoon-hyun · 2020-03-26T05:41:32Z

BTW, @xuanyuanking .
Could you confirm the above question?

[SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group #27665 (comment)

xuanyuanking · 2020-03-26T06:23:56Z

Thanks for the review.

…event loop group ### What changes were proposed in this pull request? Fix the regression caused by apache#22173. The original PR changes the logic of handling `ChunkFetchReqeust` from async to sync, that's causes the shuffle benchmark regression. This PR fixes the regression back to the async mode by reusing the config `spark.shuffle.server.chunkFetchHandlerThreadsPercent`. When the user sets the config, ChunkFetchReqeust will be processed in a separate event loop group, otherwise, the code path is exactly the same as before. ### Why are the changes needed? Fix the shuffle performance regression described in apache#22173 (comment) ### Does this PR introduce any user-facing change? Yes, this PR disable the separate event loop for FetchRequest by default. ### How was this patch tested? Existing UT. Closes apache#27665 from xuanyuanking/SPARK-24355-follow. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

otterc reviewed Feb 21, 2020

View reviewed changes

dongjoon-hyun added the SHUFFLE label Feb 28, 2020

xuanyuanking force-pushed the SPARK-24355-follow branch from 743e566 to 5015f60 Compare March 23, 2020 10:42

xuanyuanking changed the title ~~[SPARK-24355][Core][FOLLOWUP] Add flag for fetching chunks in async mode~~ [SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group Mar 23, 2020

cloud-fan reviewed Mar 23, 2020

View reviewed changes

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java Show resolved Hide resolved

cloud-fan reviewed Mar 23, 2020

View reviewed changes

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java Show resolved Hide resolved

simplify code

d4e0352

xuanyuanking force-pushed the SPARK-24355-follow branch from da8abd3 to d4e0352 Compare March 24, 2020 11:36

.

34b52ef

cloud-fan reviewed Mar 24, 2020

View reviewed changes

address comment

4f42083

cloud-fan approved these changes Mar 25, 2020

View reviewed changes

cloud-fan closed this in 0fe203e Mar 26, 2020

xuanyuanking deleted the SPARK-24355-follow branch March 26, 2020 06:23

	if (conf.getModuleName() != null &&
	conf.getModuleName().equalsIgnoreCase("shuffle") &&
	!isClientOnly && conf.separateChunkFetchRequest()) {
	chunkFetchWorkers = NettyUtils.createEventLoop(
	IOMode.valueOf(conf.ioMode()),
	conf.chunkFetchHandlerThreads(),

[SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group #27665

[SPARK-30623][Core] Spark external shuffle allow disable of separate event loop group #27665

Uh oh!

Conversation

xuanyuanking commented Feb 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Feb 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Feb 24, 2020

Uh oh!

tgravescs commented Feb 24, 2020

Uh oh!

Victsm commented Feb 24, 2020

Uh oh!

cloud-fan commented Feb 25, 2020

Uh oh!

xuanyuanking commented Mar 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Mar 23, 2020

Uh oh!

SparkQA commented Mar 23, 2020

Uh oh!

SparkQA commented Mar 23, 2020

Uh oh!

SparkQA commented Mar 24, 2020

Uh oh!

SparkQA commented Mar 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 24, 2020

Uh oh!

cloud-fan commented Mar 25, 2020

Uh oh!

SparkQA commented Mar 25, 2020

Uh oh!

SparkQA commented Mar 25, 2020

xuanyuanking commented Feb 21, 2020 •

edited

Loading

dongjoon-hyun Mar 26, 2020 •

edited

Loading