[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072

xuanyuanking · 2020-03-30T14:33:55Z

What changes were proposed in this pull request?

This reverts commit 8cf76f8. #25962

Why are the changes needed?

In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark:
Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times.
Data: TPC-DS scale=99 generate by spark-tpcds-datagen
Results:

	Base	Revert
Q20	Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606	Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463
Q33	Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136	Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276
Q52	Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871	Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108
Q56	Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579	Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

… to handle disk failures" This reverts commit 8cf76f8.

xuanyuanking · 2020-03-30T14:35:11Z

cc @yaooqinn @squito @hvanhovell @cloud-fan

hvanhovell

LGTM

SparkQA · 2020-03-30T17:10:19Z

Test build #120596 has finished for PR 28072 at commit 124e6ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-03-30T21:05:32Z

Can you please update the description to say why you think this caused the regression.

Personally I would prefer to see a separate jira filed for reverting this since it has been in for a while and we put out to preview releases with it in. Put the details in there and link the jiras so that someone looking at these can figure out what's going on. I don't think there is official policy on this but if others know of one please let me know.

dongjoon-hyun · 2020-03-30T22:51:13Z

Shall we remove the controversial Q32 result from PR description? Technically, the median value seems to be greater than the baseline due to the one outlier.

dongjoon-hyun · 2020-03-30T23:02:48Z

I also agree with @tgravescs 's opinion.

Personally I would prefer to see a separate jira filed for reverting this

HyukjinKwon

Looks good. +1 for removing Q32, and filing a JIRA

xuanyuanking · 2020-03-31T09:15:29Z

@tgravescs Thanks for the advice, a sperate Jira for this revert is necessary.
@dongjoon-hyun Thanks for checking, removed Q32 in the description.

cloud-fan · 2020-03-31T11:02:06Z

thanks, merging to master/3.0!

…ed by creating temporary file eagerly ### What changes were proposed in this pull request? This reverts commit 8cf76f8. #25962 ### Why are the changes needed? In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark: Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times. Data: TPC-DS scale=99 generate by spark-tpcds-datagen Results: | | Base | Revert | |-----|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| | Q20 | Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606 | Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463 | | Q33 | Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136 | Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276 | | Q52 | Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871 | Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108 | | Q56 | Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579 | Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502 | ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #28072 from xuanyuanking/SPARK-29285-revert. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 07c5078) Signed-off-by: Wenchen Fan <[email protected]>

…ed by creating temporary file eagerly ### What changes were proposed in this pull request? This reverts commit 8cf76f8. apache#25962 ### Why are the changes needed? In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark: Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times. Data: TPC-DS scale=99 generate by spark-tpcds-datagen Results: | | Base | Revert | |-----|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| | Q20 | Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606 | Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463 | | Q33 | Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136 | Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276 | | Q52 | Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871 | Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108 | | Q56 | Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579 | Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502 | ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes apache#28072 from xuanyuanking/SPARK-29285-revert. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

zhuqi-lucas · 2021-07-28T08:15:05Z

cc @xuanyuanking @cloud-fan @Ngone51 @tgravescs @dongjoon-hyun

Since this has been reverted, i meet the disk failure in our production clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just retry the task, if we can avoid retry to the same failed disk in one node? Or if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually create shuffle files, it caused overhead, if we can get a workaround solution to avoid create when tasks don't need temp shuffle files, i still think we should handle this.

The logs are:

DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, ********** 91): java.io.FileNotFoundException: /data2/yarn/local/usercache/aa/appcache/*****/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026 (Input/output error)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks.

Ngone51 · 2021-08-06T02:57:15Z

@zhuqi-lucas
Spark has the exclusion (a.k.a blacklisting) feature (see spark.excludeOnFailure.enabled) and Spark on YARN also has its own exclusion feature (see spark.yarn.executor.launch.excludeOnFailure.enabled) to ban such problematic nodes. You can check the configuration here.

Revert "[SPARK-29285][SHUFFLE] Temporary shuffle files should be able…

124e6ce

… to handle disk failures" This reverts commit 8cf76f8.

hvanhovell approved these changes Mar 30, 2020

View reviewed changes

cloud-fan approved these changes Mar 30, 2020

View reviewed changes

yaooqinn approved these changes Mar 30, 2020

View reviewed changes

hvanhovell mentioned this pull request Mar 30, 2020

[SPARK-29285][Shuffle] Temporary shuffle files should be able to handle disk failures #25962

Closed

dongjoon-hyun added SQL SHUFFLE labels Mar 30, 2020

HyukjinKwon approved these changes Mar 31, 2020

View reviewed changes

xuanyuanking changed the title ~~Revert [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures~~ [SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly Mar 31, 2020

cloud-fan closed this in 07c5078 Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072

[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072

Uh oh!

xuanyuanking commented Mar 30, 2020 •

edited

Loading

Uh oh!

xuanyuanking commented Mar 30, 2020

Uh oh!

hvanhovell left a comment

Uh oh!

SparkQA commented Mar 30, 2020

Uh oh!

tgravescs commented Mar 30, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

HyukjinKwon left a comment

Uh oh!

xuanyuanking commented Mar 31, 2020

Uh oh!

cloud-fan commented Mar 31, 2020

Uh oh!

zhuqi-lucas commented Jul 28, 2021

Uh oh!

Ngone51 commented Aug 6, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072

[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072

Uh oh!

Conversation

xuanyuanking commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xuanyuanking commented Mar 30, 2020

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 30, 2020

Uh oh!

tgravescs commented Mar 30, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Mar 31, 2020

Uh oh!

cloud-fan commented Mar 31, 2020

Uh oh!

zhuqi-lucas commented Jul 28, 2021

Uh oh!

Ngone51 commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

xuanyuanking commented Mar 30, 2020 •

edited

Loading

Ngone51 commented Aug 6, 2021 •

edited

Loading