Skip to content

Conversation

@Clarkkkkk
Copy link

What changes were proposed in this pull request?

When inserting into a partitioned DataSource table (would not reproduced if using a Hive table) with dynamic partition overwrite and speculative execution, attempts of same task will try to write same files.

This PR reuse FileOutputCommitter to avoid write collision, and rename files in staging directory to final output directory using the original logic in HadoopMapReduceCommitProtocol#commitJob.

Why are the changes needed?

Task failed is this circumstance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This patch is tested by existing tests in org.apache.spark.sql.sources.InsertSuite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this staging dir is used to deal with files with absolute output path, or writing data into partitioned directory with dynamicPartitionOverwrite=true.

But in this PR, it change the behavior for absolute output path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, stagingDir cannot be used here, as it might get deleted during abortJob.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

…te mode within speculative execution

Reuse FileOutputCommiter to guarantee no write collision in the staging directory.
Rename to final output in HadoopMapReduceCommitProtocol#commitJob.
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29302]Fix writing file collision in dynamic partition overwrite mode within speculative execution [SPARK-29302][CORE] Fix writing file collision in dynamic partition overwrite mode within speculative execution Oct 12, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29302][CORE] Fix writing file collision in dynamic partition overwrite mode within speculative execution [SPARK-29302][CORE][SQL] Fix writing file collision in dynamic partition overwrite mode within speculative execution Oct 12, 2019
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants