-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31314][CORE] Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly #28072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… to handle disk failures" This reverts commit 8cf76f8.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #120596 has finished for PR 28072 at commit
|
|
Can you please update the description to say why you think this caused the regression. Personally I would prefer to see a separate jira filed for reverting this since it has been in for a while and we put out to preview releases with it in. Put the details in there and link the jiras so that someone looking at these can figure out what's going on. I don't think there is official policy on this but if others know of one please let me know. |
|
Shall we remove the controversial |
|
I also agree with @tgravescs 's opinion.
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. +1 for removing Q32, and filing a JIRA
|
@tgravescs Thanks for the advice, a sperate Jira for this revert is necessary. |
|
thanks, merging to master/3.0! |
…ed by creating temporary file eagerly ### What changes were proposed in this pull request? This reverts commit 8cf76f8. #25962 ### Why are the changes needed? In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark: Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times. Data: TPC-DS scale=99 generate by spark-tpcds-datagen Results: | | Base | Revert | |-----|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| | Q20 | Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606 | Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463 | | Q33 | Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136 | Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276 | | Q52 | Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871 | Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108 | | Q56 | Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579 | Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502 | ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #28072 from xuanyuanking/SPARK-29285-revert. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 07c5078) Signed-off-by: Wenchen Fan <[email protected]>
…ed by creating temporary file eagerly ### What changes were proposed in this pull request? This reverts commit 8cf76f8. apache#25962 ### Why are the changes needed? In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark: Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times. Data: TPC-DS scale=99 generate by spark-tpcds-datagen Results: | | Base | Revert | |-----|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------| | Q20 | Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) Median 2.722007606 | Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 2.224627274) Median 2.586498463 | | Q33 | Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) Median 4.568787136 | Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 3.783188024) Median 4.082311276 | | Q52 | Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) Median 3.225437871 | Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 2.606163423) Median 3.196025108 | | Q56 | Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) Median 4.609965579 | Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 3.657525982) Median 4.195202502 | ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes apache#28072 from xuanyuanking/SPARK-29285-revert. Authored-by: Yuanjian Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
|
cc @xuanyuanking @cloud-fan @Ngone51 @tgravescs @dongjoon-hyun Since this has been reverted, i meet the disk failure in our production clusters, how can we handle the disk failed problem without this. There are many disks in yarn clusters, but if one disk failure happend, we just retry the task, if we can avoid retry to the same failed disk in one node? Or if spark has some disk blacklist solution now? And reverted solution causes that applications with many tasks don't actually create shuffle files, it caused overhead, if we can get a workaround solution to avoid create when tasks don't need temp shuffle files, i still think we should handle this. The logs are: DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, ********** 91): java.io.FileNotFoundException: /data2/yarn/local/usercache/aa/appcache/*****/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026 (Input/output error) Thanks. |
|
@zhuqi-lucas |
What changes were proposed in this pull request?
This reverts commit 8cf76f8. #25962
Why are the changes needed?
In SPARK-29285, we change to create shuffle temporary eagerly. This is helpful for not to fail the entire task in the scenario of occasional disk failure. But for the applications that many tasks don't actually create shuffle files, it caused overhead. See the below benchmark:
Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 times.
Data: TPC-DS scale=99 generate by spark-tpcds-datagen
Results:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests.