Skip to content

Commit f25bbbd

Browse files
committed
[SPARK-3280] Made sort-based shuffle the default implementation
Sort-based shuffle has lower memory usage and seems to outperform hash-based in almost all of our testing. Author: Reynold Xin <[email protected]> Closes apache#2178 from rxin/sort-shuffle and squashes the following commits: 713d341 [Reynold Xin] Fixed test failures by setting spark.shuffle.compress to the same value as spark.shuffle.spill.compress. 85165e6 [Reynold Xin] Fixed a comment typo. aa0d372 [Reynold Xin] [SPARK-3280] Made sort-based shuffle the default implementation
1 parent 4ba2673 commit f25bbbd

File tree

6 files changed

+41
-9
lines changed

6 files changed

+41
-9
lines changed

core/src/main/scala/org/apache/spark/SparkEnv.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ object SparkEnv extends Logging {
217217
val shortShuffleMgrNames = Map(
218218
"hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
219219
"sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager")
220-
val shuffleMgrName = conf.get("spark.shuffle.manager", "hash")
220+
val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")
221221
val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName)
222222
val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)
223223

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark
19+
20+
import org.scalatest.BeforeAndAfterAll
21+
22+
class HashShuffleSuite extends ShuffleSuite with BeforeAndAfterAll {
23+
24+
// This test suite should run all tests in ShuffleSuite with hash-based shuffle.
25+
26+
override def beforeAll() {
27+
System.setProperty("spark.shuffle.manager", "hash")
28+
}
29+
30+
override def afterAll() {
31+
System.clearProperty("spark.shuffle.manager")
32+
}
33+
}

core/src/test/scala/org/apache/spark/ShuffleSuite.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ import org.apache.spark.rdd.{CoGroupedRDD, OrderedRDDFunctions, RDD, ShuffledRDD
2626
import org.apache.spark.serializer.KryoSerializer
2727
import org.apache.spark.util.MutablePair
2828

29-
class ShuffleSuite extends FunSuite with Matchers with LocalSparkContext {
29+
abstract class ShuffleSuite extends FunSuite with Matchers with LocalSparkContext {
3030

3131
val conf = new SparkConf(loadDefaults = false)
3232

core/src/test/scala/org/apache/spark/SortShuffleSuite.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@ class SortShuffleSuite extends ShuffleSuite with BeforeAndAfterAll {
2424
// This test suite should run all tests in ShuffleSuite with sort-based shuffle.
2525

2626
override def beforeAll() {
27-
System.setProperty("spark.shuffle.manager",
28-
"org.apache.spark.shuffle.sort.SortShuffleManager")
27+
System.setProperty("spark.shuffle.manager", "sort")
2928
}
3029

3130
override def afterAll() {

core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ class ExternalAppendOnlyMapSuite extends FunSuite with LocalSparkContext {
4242
conf.set("spark.serializer.objectStreamReset", "1")
4343
conf.set("spark.serializer", "org.apache.spark.serializer.JavaSerializer")
4444
conf.set("spark.shuffle.spill.compress", codec.isDefined.toString)
45+
conf.set("spark.shuffle.compress", codec.isDefined.toString)
4546
codec.foreach { c => conf.set("spark.io.compression.codec", c) }
4647
// Ensure that we actually have multiple batches per spill file
4748
conf.set("spark.shuffle.spill.batchSize", "10")

docs/configuration.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -293,12 +293,11 @@ Apart from these, the following properties are also available, and may be useful
293293
</tr>
294294
<tr>
295295
<td><code>spark.shuffle.manager</code></td>
296-
<td>HASH</td>
296+
<td>sort</td>
297297
<td>
298-
Implementation to use for shuffling data. A hash-based shuffle manager is the default, but
299-
starting in Spark 1.1 there is an experimental sort-based shuffle manager that is more
300-
memory-efficient in environments with small executors, such as YARN. To use that, change
301-
this value to <code>SORT</code>.
298+
Implementation to use for shuffling data. There are two implementations available:
299+
<code>sort</code> and <code>hash</code>. Sort-based shuffle is more memory-efficient and is
300+
the default option starting in 1.2.
302301
</td>
303302
</tr>
304303
<tr>

0 commit comments

Comments
 (0)