[WIP][SPARK-28945][CORE][SQL] Support concurrent dynamic partition writes to different partitions in the same table #25739

advancedxy · 2019-09-10T02:28:52Z

What changes were proposed in this pull request?

This commit enables concurrent writes to different partitions in the same table with dynamicPartitionOverwrite enabled. Currently Spark uses table's location as output when writing to the table, which would conflict each other(multiple OutputCommitters operating on the same output dir) when writing to the same table concurrently. In this commit, we set OutputCommitter's output to stagingDir to avoid collision when dynamicPartitionOverwrite is enabled.

Why are the changes needed?

This is an improvement of user case.

Does this PR introduce any user-facing change?

Yes, users can expect success concurrent write to the same table

How was this patch tested?

Added two tests and existing tests for regression.

…ferent partitions in the same table. This commit only enables concurrent writes with dynamicPartitionOverwrite enabled.

advancedxy · 2019-09-10T02:34:21Z

I labeled this as [WIP] as I think we can also enable concurrent writes to the same table with dynamicPartitionOverwrite disabled. When we are writing to table dynamically in the strict mode, we can support concurrent writes to different static partitions. However that would require more changes and I'd like to know others opinions.

cc @cloud-fan @koertkuipers

dongjoon-hyun · 2019-09-10T02:53:16Z

ok to test

SparkQA · 2019-09-10T04:49:17Z

Test build #110389 has finished for PR 25739 at commit 1de7d30.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

advancedxy · 2019-09-10T05:06:07Z

Test build #110389 has finished for PR 25739 at commit 1de7d30.
This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Looks unrelated...

koertkuipers · 2019-09-10T13:14:18Z

I labeled this as [WIP] as I think we can also enable concurrent writes to the same table with dynamicPartitionOverwrite disabled. When we are writing to table dynamically in the strict mode, we can support concurrent writes to different static partitions. However that would require more changes and I'd like to know others opinions.

cc @cloud-fan @koertkuipers

what would concurrent writes to the same table with dynamicPartitionOverwrite disabled look like? i have a hard time coming up with a useful example of this.

advancedxy · 2019-09-10T13:37:20Z

what would concurrent writes to the same table with dynamicPartitionOverwrite disabled look like? i have a hard time coming up with a useful example of this.

Suppose we have a table with partition columns (day, hour, action). It would be useful to support
concurrent writes to Partition(day=20190910, hour=01, action), Partition(day=20190910, hour=02, action) and etc. Or to Partitions: Partition(day=20190910, hour, action), Partition(day=20190909, hour, action) and etc.

The concurrent write would be succeeded as long as the static partitions share the same size.

koertkuipers · 2019-09-10T14:44:42Z

I labeled this as [WIP] as I think we can also enable concurrent writes to the same table with dynamicPartitionOverwrite disabled. When we are writing to table dynamically in the strict mode, we can support concurrent writes to different static partitions. However that would require more changes and I'd like to know others opinions.

cc @cloud-fan @koertkuipers

what would concurrent writes to the same table with dynamicPartitionOverwrite disabled look like? i have a hard time coming up with a useful example of this.

what would concurrent writes to the same table with dynamicPartitionOverwrite disabled look like? i have a hard time coming up with a useful example of this.

Suppose we have a table with partition columns (day, hour, action). It would be useful to support
concurrent writes to Partition(day=20190910, hour=01, action), Partition(day=20190910, hour=02, action) and etc. Or to Partitions: Partition(day=20190910, hour, action), Partition(day=20190909, hour, action) and etc.

The concurrent write would be succeeded as long as the static partitions share the same size.

my understanding is that currently if you don't have dynamic partition overwrite enabled it will always delete all partitions before writing. i don't see concurrency being useful in this situation.

the example you give sounds interesting to me but its inconsistent with how i currently know static partition overwrite to function. would this be a new feature?

advancedxy · 2019-09-10T15:20:03Z

my understanding is that currently if you don't have dynamic partition overwrite enabled it will always delete all partitions before writing.

IIRC, dynamic partition writing without dynamicPartitionOverwrite enabled, Spark will delete matching partitions(partitions prefixed by the static partition), not all partitions.

For example, when insert overwrite Partition(day=20190910, hour=01, action), Spark will first delete all the partitions under $some/path/to/table/location/day=20190910/hour=01

koertkuipers · 2019-09-10T15:39:21Z

my understanding is that currently if you don't have dynamic partition overwrite enabled it will always delete all partitions before writing.

IIRC, dynamic partition writing without dynamicPartitionOverwrite enabled, Spark will delete matching partitions(partitions prefixed by the static partition), not all partitions.

For example, when insert overwrite Partition(day=20190910, hour=01, action), Spark will first delete all the partitions under $some/path/to/table/location/day=20190910/hour=01

when i need to overwrite a particular partition in a filesource such as parquet i will write directly to the partition path, e.g.:

df.write.format("parquet").save("some/path/to/table/location/day=20190910/hour=01")

but in that case concurrency already works, as the writers do not use the same baseDir.

i was not aware that there is an alternative syntax or way of doing this without dynamic partition overwrite. sorry for the confusion.

advancedxy · 2019-09-10T16:10:26Z

i was not aware that there is an alternative syntax or way of doing this without dynamic partition overwrite. sorry for the confusion.

No worries. Consider the sql way of dynamic partition insertion:

insert overwrite table t partition(day='20190901', hour='01', action) 
select xxx, action from src

koertkuipers · 2019-09-10T16:25:31Z

i was not aware that there is an alternative syntax or way of doing this without dynamic partition overwrite. sorry for the confusion.

No worries. Consider the sql way of dynamic partition insertion:
insert overwrite table t partition(day='20190901', hour='01', action) 
select xxx, action from src

ah ok we dont use sql syntax at all so thats why i was not aware of it

koertkuipers · 2019-09-10T17:56:11Z

i was not aware that there is an alternative syntax or way of doing this without dynamic partition overwrite. sorry for the confusion.

No worries. Consider the sql way of dynamic partition insertion:
insert overwrite table t partition(day='20190901', hour='01', action) 
select xxx, action from src
ah ok we dont use sql syntax at all so thats why i was not aware of it

since i dont use this feature of overwriting partitions using static overwrite mode i do not have an opinion on it. however i am excited about dynamic partition overwrite with concurrent writers. thanks for this!

advancedxy · 2019-09-11T13:36:45Z

Ping @cloud-fan, what do you think about concurrent writes to the same table with dynamicPartitionOverwrite disabled?

Added in this pr
Added in another pr
neutral, nice to have(In my opinion, dynamicPartitionOverwrite is more reasonable, and should be the default behaviour in Spark 3.0)

cloud-fan · 2019-09-16T15:29:30Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

+  }
+
  protected def setupCommitter(context: TaskAttemptContext): OutputCommitter = {
+    // set output path to stagingDir to avoid potential collision of multiple concurrent write tasks


when dynamicPartitionOverwrite=true, we already write files to the staging dir, see newTaskTempFile.

In fact, I don't see how the committer is related to the staging dir. If you look at commitTask and commitJob, we kind of manually commit the files in the staging dir, by moving it to the table dir.

In fact, I don't see how the committer is related to the staging dir. If you look at commitTask and commitJob, we kind of manually commit the files in the staging dir, by moving it to the table dir.

Yes, we manually commit files in the staging dir. The problem is in the HadoopMapReduceCommitProtocol's commitJob calls, it first calls committer.commitJob(jobContext), which relates to the output path passes to the JobContext.

spark/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

Lines 190 to 198 in 1de7d30

override def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit = {

committer.commitJob(jobContext)

if (hasValidPath) {

val (allAbsPathFiles, allPartitionPaths) =

taskCommits.map(_.obj.asInstanceOf[(Map[String, String], Set[String])]).unzip

val fs = stagingDir.getFileSystem(jobContext.getConfiguration)

val filesToMove = allAbsPathFiles.foldLeft(Map[String, String]())(_ ++ _)

The OutputCommitter cannot work correctly if multiple OutputCommitter working on the same output path( concurrent writes to different partition to the same table, as the output would be the same: the table output location). After changing the output path to the staging dir, concurrent jobs can have different output dirs.

AmplabJenkins · 2019-09-16T18:06:40Z

Can one of the admins verify this patch?

cloud-fan · 2019-09-19T15:02:23Z

core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala

  private def stagingDir = new Path(path, ".spark-staging-" + jobId)

+  /**
+   * Get the desired output path for the job. The output will be [[path]] when


The output will be [[path]] what does path mean here?

the path is defined in the class parameter, and the comment for that is:

* @param jobId the job's or stage's id * @param path the job's output path, or null if committer acts as a noop * @param dynamicPartitionOverwrite If true, Spark will overwrite partition directories at runtime * dynamically, i.e., we first write files under a staging * directory with partition path, e.g. * /path/to/staging/a=1/b=1/xxx.parquet. When committing the job, * we first clean up the corresponding partition directories at * destination path, e.g. /path/to/destination/a=1/b=1, and move * files from staging directory to the corresponding partition * directories under destination path.

advancedxy · 2019-09-24T08:26:15Z

I am closing this in favour of #25863, thanks for @turboFei's excellent job.

[WIP][SPARK-28945] support concurrent dynamic partition writes to dif…

1de7d30

…ferent partitions in the same table. This commit only enables concurrent writes with dynamicPartitionOverwrite enabled.

dongjoon-hyun added SPARK CORE SQL labels Sep 10, 2019

dongjoon-hyun changed the title ~~[WIP][SPARK-28945] support concurrent dynamic partition writes to different partitions in the same table~~ [WIP][SPARK-28945][CORE][SQL] Support concurrent dynamic partition writes to different partitions in the same table Sep 10, 2019

cloud-fan reviewed Sep 16, 2019

View reviewed changes

advancedxy mentioned this pull request Sep 17, 2019

[WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed #25795

Closed

cloud-fan reviewed Sep 19, 2019

View reviewed changes

advancedxy closed this Sep 24, 2019

advancedxy mentioned this pull request Sep 25, 2019

[SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. #25863

Closed

	override def commitJob(jobContext: JobContext, taskCommits: Seq[TaskCommitMessage]): Unit = {
	committer.commitJob(jobContext)

	if (hasValidPath) {
	val (allAbsPathFiles, allPartitionPaths) =
	taskCommits.map(_.obj.asInstanceOf[(Map[String, String], Set[String])]).unzip
	val fs = stagingDir.getFileSystem(jobContext.getConfiguration)

	val filesToMove = allAbsPathFiles.foldLeft(Map[String, String]())(_ ++ _)

[WIP][SPARK-28945][CORE][SQL] Support concurrent dynamic partition writes to different partitions in the same table #25739

[WIP][SPARK-28945][CORE][SQL] Support concurrent dynamic partition writes to different partitions in the same table #25739

Uh oh!

Conversation

advancedxy commented Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

advancedxy commented Sep 10, 2019

Uh oh!

dongjoon-hyun commented Sep 10, 2019

Uh oh!

SparkQA commented Sep 10, 2019

Uh oh!

advancedxy commented Sep 10, 2019

Uh oh!

koertkuipers commented Sep 10, 2019

Uh oh!

advancedxy commented Sep 10, 2019

Uh oh!

koertkuipers commented Sep 10, 2019

Uh oh!

advancedxy commented Sep 10, 2019

Uh oh!

koertkuipers commented Sep 10, 2019

Uh oh!

advancedxy commented Sep 10, 2019

Uh oh!

koertkuipers commented Sep 10, 2019

Uh oh!

koertkuipers commented Sep 10, 2019

Uh oh!

advancedxy commented Sep 11, 2019

Uh oh!

cloud-fan Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

advancedxy Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Sep 16, 2019

Uh oh!

cloud-fan Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

advancedxy Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

advancedxy commented Sep 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

advancedxy commented Sep 10, 2019 •

edited

Loading

cloud-fan Sep 16, 2019 •

edited

Loading

cloud-fan Sep 19, 2019 •

edited

Loading