[SPARK-19991]FileSegmentManagedBuffer performance improvement by witgo · Pull Request #17329 · apache/spark

witgo · 2017-03-17T03:29:18Z

FileSegmentManagedBuffer performance improvement.

What changes were proposed in this pull request?

When we do not set the value of the configuration items spark.storage.memoryMapThreshold and spark.shuffle.io.lazyFD,
each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation.

In the use case, this PR can improve the performance of about 3.5%

The test code:

(1 to 10).foreach { i =>
  val numPartition = 10000
  val rdd = sc.parallelize(0 until numPartition).repartition(numPartition).flatMap { t =>
    (0 until numPartition).map(r => r * numPartition + t)
  }.repartition(numPartition)
  val serializeStart = System.currentTimeMillis()
  rdd.sum()
  val serializeFinish = System.currentTimeMillis()
  println(f"Test $i: ${(serializeFinish - serializeStart) / 1000D}%1.2f")
}

and spark-defaults.conf file:

spark.master                                      yarn-client
spark.executor.instances                          20
spark.driver.memory                               64g
spark.executor.memory                             30g
spark.executor.cores                              5
spark.default.parallelism                         100 
spark.sql.shuffle.partitions                      100
spark.serializer                                  org.apache.spark.serializer.KryoSerializer
spark.driver.maxResultSize                        0
spark.ui.enabled                                  false 
spark.driver.extraJavaOptions                     -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M 
spark.executor.extraJavaOptions                   -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
spark.cleaner.referenceTracking.blocking          true
spark.cleaner.referenceTracking.blocking.shuffle  true

The test results are as follows

SPARK-19991	`68ea290`
226.09 s	235.21 s

How was this patch tested?

Existing tests.

SparkQA · 2017-03-17T05:39:27Z

Test build #74714 has finished for PR 17329 at commit abcfc79.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-17T05:45:34Z

Test build #74715 has finished for PR 17329 at commit dc5fd8d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-17T06:22:31Z

Test build #74727 has started for PR 17329 at commit e0a3b62.

AmplabJenkins · 2017-03-17T07:05:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74727/
Test FAILed.

srowen · 2017-03-17T07:34:42Z

Out of curiosity why does each call generate NoSuchElementException ?

witgo · 2017-03-17T08:56:06Z

public class HadoopConfigProvider extends ConfigProvider {
  private final Configuration conf;

  public HadoopConfigProvider(Configuration conf) {
    this.conf = conf;
  }

  @Override
  public String get(String name) {
    String value = conf.get(name);
    // When do not set the value of spark.storage.memoryMapThreshold or spark.shuffle.io.lazyFD,
   //  the value of `value` is null
    if (value == null) {
      throw new NoSuchElementException(name);
    }
    return value;
  }

  @Override
  public Iterable<Map.Entry<String, String>> getAll() {
    return conf;
  }

}

srowen · 2017-03-18T11:02:16Z

+    this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length);
+  }
+
+  public FileSegmentManagedBuffer(


Yeah that makes sense then but I don't think you need a new public constructor for this

Oh, do you have a better idea?

Just don't add a new constructor. The existing one can set the new fields.

That will change a lot of code, right?

No, why? I mean keep exactly the same constructors, no more no less. No code would change. You just set your two new fields in the current constructor. It actually means you don't need some of the changes you made here.

I'm sorry, I probably did not make it clear.
Suppose there are E Executor in the cluster, a shuffle process has M Map task, R reduce task, in the master branch will be created:

Up to M * R FileSegmentManagedBuffer instances

Up to 2 * M * R NoSuchElementException instances

in this PR will be created:

Up to M * R FileSegmentManagedBuffer instances

Up to 2 * NoSuchElementException instances (ExternalShuffleBlockResolver and IndexShuffleBlockResolver are created once executor starts and They call the new constructor to create a FileSegmentManagedBuffer instance)

This still doesn't address the point. What you say is true even if you make the change I suggest, which is to remove the superfluous constructor. The performance is exactly the same.

Sorry,I didn't get your idea. Can you write some code?

I'm simply describing what you proposed above at #17329 (comment)

Ping @witgo to update or close

srowen · 2017-03-30T15:13:39Z

OK. I think anyone's welcome to make this change without the new constructor. It sounded fine otherwise.

…ement ## What changes were proposed in this pull request? Avoid `NoSuchElementException` every time `ConfigProvider.get(val, default)` falls back to default. This apparently causes non-trivial overhead in at least one path, and can easily be avoided. See #17329 ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #17567 from srowen/SPARK-19991.

Guoqiang Li added 2 commits March 17, 2017 11:19

FileSegmentManagedBuffer performance improvement

abcfc79

Format code

dc5fd8d

fix ShuffleBlockFetcherIteratorSuite

e0a3b62

srowen reviewed Mar 18, 2017

View reviewed changes

witgo closed this Mar 30, 2017

srowen mentioned this pull request Apr 7, 2017

[SPARK-19991][CORE][YARN] FileSegmentManagedBuffer performance improvement #17567

Closed

Conversation

witgo commented Mar 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 17, 2017

Uh oh!

SparkQA commented Mar 17, 2017

Uh oh!

SparkQA commented Mar 17, 2017

Uh oh!

AmplabJenkins commented Mar 17, 2017

Uh oh!

srowen commented Mar 17, 2017

Uh oh!

witgo commented Mar 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

witgo Mar 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

witgo commented Mar 17, 2017 •

edited

Loading

witgo commented Mar 17, 2017 •

edited

Loading

witgo Mar 25, 2017 •

edited

Loading