[SPARK-19576] [Core] Task attempt paths exist in output path after saveAsNewAPIHadoopFile completes with speculation enabled #16912
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
writeShardinsaveAsNewAPIHadoopDatasetalways committed its tasks without question. The problem is that when speculation is enabled sometimes this can result in multiple tasks committing their output to the same path, which may lead to task temporary paths exist in output path aftersaveAsNewAPIHadoopFilecompletes.Assume there are two attempt tasks that commit at the same time, The two attempt tasks maybe rename their task attempt paths to task committed path at the same time. When one task's
renameoperation completes, the other task'srenameoperation will let its task attempt path under the task committed path.Anyway, it is not recommended that
writeShardinsaveAsNewAPIHadoopDatasetalways committed its tasks without question. Similar question in SPARK-4879 triggered by calling saveAsHadoopFile has been solved. Newest master has solved it too. This PR just fix 2.1