[SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsException issue #26339

turboFei · 2019-10-31T08:14:09Z

What changes were proposed in this pull request?

For dynamic partition overwrite, its working dir is .spark-staging-{jobId}.
Task file name formatted part-$taskId-$jobId$ext(regardless task attempt Id).
Each task writes its output to:

.spark-staging-{jobId}/partitionPath1/taskFileName1
.spark-staging-{jobId}/partitionPath2/taskFileName2
...
.spark-staging-{jobId}/partitionPathN/taskFileNameN

If speculation is enabled, there may be several tasks, which have same taskId and different attemptId, write to the same files concurrently.
For distributedFileSystem, it only allow one task to hold the lease to write a file, if two tasks want to write the same file, an exception like no lease on inode would be thrown.

Even speculation is not enabled, if a task aborted due to Executor OOM, its output would not be cleaned up.
Then a new task launched to write the same file, because parquet disallows overwriting, a FileAlreadyExistsException would be thrown, like.

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: /user/hive/warehouse/t2/.spark-staging-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1/part1=2/part2=2/part-00000-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1.c000.snappy.parquet for client 127.0.0.1 already exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2578)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2465)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2349)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:398)

It is a critical issue and would cause job failed.

In this PR, we fix this issue with the solution below:

set a working path under staging dir named partitionPath-attemptId.
after task completed, rename partitionPath-attemptId/fileName to partitionPath/fileName

Why are the changes needed?

Without this PR, dynamic partition overwrite operation might fail.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

turboFei · 2019-10-31T08:18:38Z

cc @cloud-fan @advancedxy @viirya @wangyum
Can you help take a look? Thanks in advance.

turboFei · 2019-11-07T07:17:39Z

gentle ping @cloud-fan @advancedxy @viirya

turboFei · 2019-11-13T09:53:59Z

gentle ping @cloud-fan @advancedxy @viirya Could you help take a look? Thanks in advance!

dbtsai · 2019-11-18T20:09:54Z

Jenkins, okay to test.

turboFei · 2019-11-19T04:38:49Z

retest this please.

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

dongjoon-hyun · 2019-12-11T22:05:34Z

ok to test

SparkQA · 2019-12-12T01:05:42Z

Test build #115200 has finished for PR 26339 at commit 286a87f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-12T11:18:59Z

Test build #115221 has finished for PR 26339 at commit 4c18493.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

turboFei · 2019-12-12T11:57:26Z

@dongjoon-hyun test passed. Could you help take a look? thanks in advance!

turboFei · 2019-12-16T02:24:09Z

gentle ping @dongjoon-hyun @dbtsai @viirya

ramesh-muthusamy · 2019-12-16T17:53:15Z

@turboFei do we have test cases to capture the changes.

turboFei · 2019-12-18T02:27:19Z

@turboFei do we have test cases to capture the changes.

I'll try my best to think how to add UT.

ramesh-muthusamy · 2019-12-19T06:19:07Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+        val fileName = stagingTaskFile.getName
+        val taskPartitionPath = getPartitionPath(stagingTaskFile)
+        val destFile = new Path(new Path(stagingDir, taskPartitionPath), fileName)
+        fs.rename(stagingTaskFile, destFile)


fs.rename returns boolean in specific cases , please handle the same.

github-actions · 2020-03-29T00:10:59Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

dongjoon-hyun · 2020-03-30T03:09:06Z

Can we have a test case for this PR?

dongjoon-hyun · 2020-03-30T03:19:13Z

ok to test

SparkQA · 2020-03-30T06:16:39Z

Test build #120566 has finished for PR 26339 at commit 4c18493.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

venkata91 · 2020-03-31T20:24:22Z

@dongjoon-hyun @turboFei Is this PR still being worked on? We are having similar issues in our platform for a while, it would be great if we can get this fixed soon.

turboFei · 2020-04-01T01:05:47Z

I will follow it. But I am confused that how to add an UT.
PS: I am a little busy this week.

Ngone51 · 2020-05-06T13:11:31Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

    dir.map { d =>
-      new Path(new Path(stagingDir, d), filename).toString
+      if (dynamicPartitionOverwrite) {
+        val tempFile = new Path(dynamicStagingTaskPath(dir.get, taskContext), filename)


nit: dir.get -> d

Ngone51 · 2020-05-06T13:14:23Z

ping @cloud-fan

I think the solution given in this PR could work for the issue.

SparkQA · 2020-05-06T17:29:48Z

Test build #122368 has finished for PR 26339 at commit 100d0fe.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-06T23:23:01Z

retest this please

SparkQA · 2020-05-07T02:33:13Z

Test build #122382 has finished for PR 26339 at commit 100d0fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-08T09:42:59Z

Test build #122437 has finished for PR 26339 at commit 8feafc4.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-08T12:49:57Z

Test build #122438 has finished for PR 26339 at commit f9ae20f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

turboFei · 2020-05-12T14:07:01Z

@jerryshao @cloud-fan could you help review this pr? It can help resolve a critical issue. thanks in advance.

turboFei · 2020-05-12T22:28:38Z

also cc @jiangxb1987

HyukjinKwon · 2020-05-13T05:37:11Z

also cc @vanzin

koertkuipers · 2020-05-22T14:05:04Z

i am getting worried now this wont make it into spark 3.0.0
this is a fault tolerance bug in spark. not as serious as a correctness issue, but pretty high up there i would say (whats point of deploying distributed fault tolerant system if its not fault tolerant)...

Ngone51 · 2020-05-22T14:20:13Z

@koertkuipers you could send your concern to the vote of Spark 3.0 release and see if PMC/committer would consider it as release blocker or not.

koertkuipers · 2020-05-22T14:26:41Z

@Ngone51 yeah i thought about doing that, but i dont want to slowdown the spark 3 release even more (and this is not a regression i guess?). now i am just hoping someone sees my messages here and reviews this before spark 3.0.0 rc3!

Ngone51 · 2020-05-22T14:29:58Z

The vote thread now has more eys on than this PR and as you know this PR is somehow overlooked for a while.

…ion overwrite a task would conflict with its speculative task

SparkQA · 2020-06-18T07:12:00Z

Test build #124203 has finished for PR 26339 at commit 717d9a5.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-06-22T02:16:09Z

Retest this please

SparkQA · 2020-06-22T05:19:39Z

Test build #124343 has finished for PR 26339 at commit 717d9a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

LuciferYang · 2020-06-30T03:07:08Z

@dongjoon-hyun @turboFei Is this PR still being worked on? We are having similar issues in our production environment, and I found there are similar PRs try to solve this problem, such as #26090, #26971

turboFei · 2020-06-30T03:09:39Z

Gentle ping @dongjoon-hyun @dbtsai

turboFei · 2020-07-03T02:58:21Z

close this and will a create a new pr with new solution. thanks

koertkuipers · 2020-07-03T03:00:27Z

close this and will a create a new pr with new solution. thanks

why close this? did you find a better approach?

turboFei · 2020-07-03T03:06:07Z

close this and will a create a new pr with new solution. thanks

why close this? did you find a better approach?

Hi, here is the new patch.
In the new solution, I define a new OutputCommitter.
I am stilling working on it.
#28989

koertkuipers · 2020-07-03T03:11:38Z

close this and will a create a new pr with new solution. thanks

why close this? did you find a better approach?

Hi, here is the new patch.
In the new solution, I define a new OutputCommitter.
I am stilling working on it.
#28989

thank you. curious why you changed direction... if there is anything wrong with approach in this pullreq? we were just about to start testing it at scale that's why i ask.
best

turboFei · 2020-07-03T03:20:48Z

close this and will a create a new pr with new solution. thanks

why close this? did you find a better approach?

Hi, here is the new patch.
In the new solution, I define a new OutputCommitter.
I am stilling working on it.
#28989

thank you. curious why you changed direction... if there is anything wrong with approach in this pullreq? we were just about to start testing it at scale that's why i ask.
best

In the origin solution, when renaming staging task file to final file.
We judge whether the final file exists and then judge whether rename staging task file.

It is tricky that the final files may from different tasks.

If the task output for a partition has multi files(or bucket table insert case), the data might be corrupted.

So, we need outputCommitCoordinator to help decide which task can commit.

In the new solution, we define a new output committer to leverage outputCommitCoordinator(by invoking SparkHadoopMapRedUtil.commitTask)

dongjoon-hyun added SPARK CORE SQL labels Nov 3, 2019

ramesh-muthusamy reviewed Dec 9, 2019

View reviewed changes

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala Outdated Show resolved Hide resolved

turboFei force-pushed the SPARK-27194-dynamic-cleanUp branch from 286a87f to 4c18493 Compare December 12, 2019 06:52

ramesh-muthusamy reviewed Dec 19, 2019

View reviewed changes

github-actions bot added the Stale label Mar 29, 2020

github-actions bot closed this Mar 30, 2020

dongjoon-hyun removed the Stale label Mar 30, 2020

dongjoon-hyun reopened this Mar 30, 2020

turboFei force-pushed the SPARK-27194-dynamic-cleanUp branch 3 times, most recently from 79d9443 to 9c9f39d Compare April 3, 2020 15:31

Ngone51 reviewed May 6, 2020

View reviewed changes

ramesh-muthusamy mentioned this pull request May 11, 2020

[SPARK-30320][SQL] Fix insert overwrite to DataSource table with dynamic partition error #26971

Closed

turboFei force-pushed the SPARK-27194-dynamic-cleanUp branch from f9ae20f to de9f206 Compare June 18, 2020 07:03

[SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partit…

717d9a5

…ion overwrite a task would conflict with its speculative task

turboFei force-pushed the SPARK-27194-dynamic-cleanUp branch from de9f206 to 717d9a5 Compare June 18, 2020 07:04

turboFei closed this Jul 3, 2020

WinkerDu mentioned this pull request Jul 6, 2020

[SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode #29000

Closed

[SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsException issue #26339

[SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsException issue #26339

Uh oh!

Conversation

turboFei commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

turboFei commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turboFei commented Nov 7, 2019

Uh oh!

turboFei commented Nov 13, 2019

Uh oh!

dbtsai commented Nov 18, 2019

Uh oh!

turboFei commented Nov 19, 2019

Uh oh!

Uh oh!

dongjoon-hyun commented Dec 11, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

SparkQA commented Dec 12, 2019

Uh oh!

turboFei commented Dec 12, 2019

Uh oh!

turboFei commented Dec 16, 2019

Uh oh!

ramesh-muthusamy commented Dec 16, 2019

Uh oh!

turboFei commented Dec 18, 2019

Uh oh!

ramesh-muthusamy Dec 19, 2019

Choose a reason for hiding this comment

Uh oh!

turboFei Apr 3, 2020

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 29, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

dongjoon-hyun commented Mar 30, 2020

Uh oh!

SparkQA commented Mar 30, 2020

Uh oh!

venkata91 commented Mar 31, 2020

Uh oh!

turboFei commented Apr 1, 2020

Uh oh!

Ngone51 May 6, 2020

Choose a reason for hiding this comment

Uh oh!

Ngone51 commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented May 6, 2020

Uh oh!

maropu commented May 6, 2020

Uh oh!

SparkQA commented May 7, 2020

Uh oh!

SparkQA commented May 8, 2020

Uh oh!

SparkQA commented May 8, 2020

Uh oh!

turboFei commented May 12, 2020

Uh oh!

turboFei commented May 12, 2020

Uh oh!

HyukjinKwon commented May 13, 2020

Uh oh!

koertkuipers commented May 22, 2020

Uh oh!

Ngone51 commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

turboFei commented Oct 31, 2019 •

edited

Loading

turboFei commented Oct 31, 2019 •

edited

Loading

Ngone51 commented May 6, 2020 •

edited

Loading

Ngone51 commented May 22, 2020 •

edited

Loading

turboFei commented Jul 3, 2020 •

edited

Loading