Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

Disable continuous shuffle block fetching when the old fetch protocol in use.

Why are the changes needed?

The new feature of continuous shuffle block fetching depends on the latest version of the shuffle fetch protocol. We should keep this constraint in BlockStoreShuffleReader.fetchContinuousBlocksInBatch.

Does this PR introduce any user-facing change?

Users will not get the exception related to continuous shuffle block fetching when old version of the external shuffle service is used.

How was this patch tested?

Existing UT.

@cloud-fan
Copy link
Contributor

just for curiosity, how things can go wrong if we do batch fetch with old shuffle protocol? the shuffle server can't recognize the new request and fail?

@SparkQA
Copy link

SparkQA commented Nov 25, 2019

Test build #114403 has finished for PR 26663 at commit c05adaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

xuanyuanking commented Nov 25, 2019

how things can go wrong if we do batch fetch with old shuffle protocol?

That's the scenario Yuming met at https://github.com/apache/spark/pull/26147/files#r348943508.
For the old shuffle service user, only spark.shuffle.useOldFetchProtocol set to true can job run. But if we batch fetch here, only shuffle block fetcher will merge the request because, in the old code, we do the continuous fetching without checking old fetch protocol. While the server-side would use the old protocol and use OpenBlocks in OneForOneBlockFetcher, which still fetches blocks one by one.
So the size number between fetcher and iterator will be inconsistent.

@SparkQA
Copy link

SparkQA commented Nov 26, 2019

Test build #114450 has finished for PR 26663 at commit ba84e6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking xuanyuanking changed the title [SPARK-30015][Core] Continuous shuffle block fetching should be disabled by default when the old fetch protocol is used [SPARK-30025][Core] Continuous shuffle block fetching should be disabled by default when the old fetch protocol is used Nov 29, 2019
@SparkQA
Copy link

SparkQA commented Nov 29, 2019

Test build #114620 has finished for PR 26663 at commit 892da74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2019

Test build #114703 has finished for PR 26663 at commit 9c53b4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 169415f Dec 2, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@xuanyuanking
Copy link
Member Author

Thanks!

@xuanyuanking xuanyuanking deleted the SPARK-30025 branch December 2, 2019 14:34
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Dec 6, 2019
…led by default when the old fetch protocol is used

### What changes were proposed in this pull request?
Disable continuous shuffle block fetching when the old fetch protocol in use.

### Why are the changes needed?
The new feature of continuous shuffle block fetching depends on the latest version of the shuffle fetch protocol. We should keep this constraint in `BlockStoreShuffleReader.fetchContinuousBlocksInBatch`.

### Does this PR introduce any user-facing change?
Users will not get the exception related to continuous shuffle block fetching when old version of the external shuffle service is used.

### How was this patch tested?
Existing UT.

Closes apache#26663 from xuanyuanking/SPARK-30025.

Authored-by: Yuanjian Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants