[SPARK-20014] Optimize mergeSpillsWithFileStream method by sitalkedia · Pull Request #17343 · apache/spark

sitalkedia · 2017-03-19T00:25:59Z

What changes were proposed in this pull request?

When the individual partition size in a spill is small, mergeSpillsWithTransferTo method does many small disk ios which is really inefficient. One way to improve the performance will be to use mergeSpillsWithFileStream method by turning off transfer to and using buffered file read/write to improve the io throughput.
However, the current implementation of mergeSpillsWithFileStream does not do a buffer read/write of the files and in addition to that it unnecessarily flushes the output files for each partitions.

How was this patch tested?

Tested this change by running a job on the cluster and the map stage run time was reduced by around 20%.

sitalkedia · 2017-03-19T00:31:12Z

cc - @rxin, @squito, @zsxwing

SparkQA · 2017-03-19T02:27:08Z

Test build #74798 has finished for PR 17343 at commit e9ac76e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-19T02:31:11Z

Test build #74800 has finished for PR 17343 at commit 1834db6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-19T03:58:30Z

Test build #74802 has finished for PR 17343 at commit 00da825.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-19T06:57:27Z

Test build #74805 has finished for PR 17343 at commit 368dd29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm · 2017-03-19T07:16:12Z

If we make flush() noop, then buffered (uncommitted) data wont be written to the stream; am I missing something here, or is this change broken ?

mridulm · 2017-03-19T07:17:04Z

Background - you need to do a flush() to ensure the indices generated are valid.

mridulm · 2017-03-19T07:30:08Z

Ah, looks like I missed that CountingOutputStream was introduced after BOS and not before.
Looks good to me.

rxin · 2017-03-19T15:16:49Z

Can you add some documentation inline so in the future we'd know why specific implementations were chosen?

sitalkedia · 2017-03-20T18:44:33Z

@rxin - Updated documentation.

SparkQA · 2017-03-20T20:47:18Z

Test build #74893 has finished for PR 17343 at commit 5fe279e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-20T21:23:14Z

Test build #74897 has finished for PR 17343 at commit 06c1909.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sitalkedia · 2017-03-24T20:27:26Z

ping @rxin, @mridulm.

mridulm · 2017-03-25T05:45:42Z

LGTM will wait a bit to allow for others to comment.
@zsxwing can you also take a look ?

sitalkedia · 2017-05-03T20:36:08Z

ping @zsxwing.

sameeragarwal

Thanks @sitalkedia, the optimization looks solid. I've some extremely minor stylistic comments (unfortunately spark's style-checker doesn't work on .java files so many of these errors weren't caught automatically). Additionally, just to make sure, is this code covered under existing tests?

sameeragarwal · 2017-05-25T06:45:27Z

 import java.nio.channels.FileChannel;
 import java.util.Iterator;

+import org.apache.spark.io.NioBufferedFileInputStream;


nit: import order

sameeragarwal · 2017-05-25T06:46:32Z


+  private class CloseAndFlushShieldOutputStream extends CloseShieldOutputStream {
+
+    public CloseAndFlushShieldOutputStream(OutputStream outputStream) {


This needn't be public

sameeragarwal · 2017-05-25T06:50:18Z


+    final OutputStream bos = new BufferedOutputStream(
+            new FileOutputStream(outputFile),
+            (int) sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024);


Is there a reason to introduce an extra config? Can we not use spark.shuffle.file.buffer here?

@sameeragarwal - Thanks for taking a look. Tha rational behind having a separate config for write buffer is that it is useful to have a larger write buffer than the read buffer, because for jobs spilling a large amount of data to disk might create multiple spill files on disk. So we will have multiple read buffer but only one write buffer. Having a larger write buffer allows us to do the merge all in memory without hitting the disk frequently for writes. We have observed this config helps speed up our large jobs significantly.

Makes sense.

nit: please create a field to store it rather than parsing the conf for each call.

Hmm.. I am not sure if I get it. The function mergeSpillsWithFileStream will be called only once per task?

Yeah, you are right. NVM.

sameeragarwal · 2017-05-25T06:50:57Z

  /**
-   * Merges spill files using Java FileStreams. This code path is slower than the NIO-based merge,
-   * {@link UnsafeShuffleWriter#mergeSpillsWithTransferTo(SpillInfo[], File)}, so it's only used in
+   * Merges spill files using Java FileStreams. This code path is typically slower than the NIO-based merge,


nit: some of these lines are great than 100ch (in comments and code). Can you please fix those?

sameeragarwal · 2017-05-25T06:54:23Z

      for (int i = 0; i < spills.length; i++) {
-        spillInputStreams[i] = new FileInputStream(spills[i].file);
+        spillInputStreams[i] = new NioBufferedFileInputStream(spills[i].file,
+                (int) sparkConf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024);


nit: the formatting seems a bit off

sitalkedia · 2017-05-25T18:39:34Z

Thanks @sameeragarwal, addressed the check style issues. Yes, the exisiting unit tests in UnsafeShuffleWriter#mergeSpillsWithTransferToAndLZF covers this code.

SparkQA · 2017-05-25T21:11:32Z

Test build #77382 has finished for PR 17343 at commit d4f09c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2017-05-26T00:40:41Z

cc @zsxwing Could you find some time to review this?

sameeragarwal · 2017-05-26T00:50:26Z

LGTM

zsxwing

Looks pretty good except some nits.

zsxwing · 2017-05-26T06:59:58Z


+    final OutputStream bos = new BufferedOutputStream(
+            new FileOutputStream(outputFile),
+            (int) sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024);


nit: please create a field to store it rather than parsing the conf for each call.

zsxwing · 2017-05-26T07:00:39Z

-        spillInputStreams[i] = new FileInputStream(spills[i].file);
+        spillInputStreams[i] = new NioBufferedFileInputStream(
+            spills[i].file,
+            (int) sparkConf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024);


nit: please create a field to store it rather than parsing the conf inside the loop.

zsxwing · 2017-05-26T17:51:45Z

LGTM pending tests.

SparkQA · 2017-05-26T20:15:13Z

Test build #77430 has finished for PR 17343 at commit 4bc6e3e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-05-26T20:40:55Z

LGTM. Thanks! Merging to master.

sitalkedia force-pushed the upstream_mergeSpillsWithFileStream branch from e9ac76e to 1834db6 Compare March 19, 2017 00:27

[SPARK-20014] Optimize mergeSpillsWithFileStream method

06c1909

sitalkedia force-pushed the upstream_mergeSpillsWithFileStream branch from 5fe279e to 06c1909 Compare March 20, 2017 18:46

sameeragarwal reviewed May 25, 2017

View reviewed changes

Minor changes as per review comments

d4f09c2

zsxwing reviewed May 26, 2017

View reviewed changes

Addressed review comments

4bc6e3e

asfgit closed this in 473d755 May 26, 2017


		private class CloseAndFlushShieldOutputStream extends CloseShieldOutputStream {

		public CloseAndFlushShieldOutputStream(OutputStream outputStream) {

Conversation

sitalkedia commented Mar 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sitalkedia commented Mar 19, 2017

Uh oh!

SparkQA commented Mar 19, 2017

Uh oh!

SparkQA commented Mar 19, 2017

Uh oh!

SparkQA commented Mar 19, 2017

Uh oh!

SparkQA commented Mar 19, 2017

Uh oh!

mridulm commented Mar 19, 2017

Uh oh!

mridulm commented Mar 19, 2017

Uh oh!

mridulm commented Mar 19, 2017

Uh oh!

rxin commented Mar 19, 2017

Uh oh!

sitalkedia commented Mar 20, 2017

Uh oh!

SparkQA commented Mar 20, 2017

Uh oh!

SparkQA commented Mar 20, 2017

Uh oh!

sitalkedia commented Mar 24, 2017

Uh oh!

mridulm commented Mar 25, 2017

Uh oh!

sitalkedia commented May 3, 2017

Uh oh!

sameeragarwal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sitalkedia commented May 25, 2017

Uh oh!

SparkQA commented May 25, 2017

Uh oh!

jiangxb1987 commented May 26, 2017

Uh oh!

sameeragarwal commented May 26, 2017

Uh oh!

zsxwing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented May 26, 2017

Uh oh!

SparkQA commented May 26, 2017

Uh oh!

sitalkedia commented Mar 19, 2017 •

edited

Loading