Skip to content

Commit 0307db0

Browse files
aarondavpwendell
authored andcommitted
SPARK-1099: Introduce local[*] mode to infer number of cores
This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core. Author: Aaron Davidson <[email protected]> Closes apache#182 from aarondav/110 and squashes the following commits: a88294c [Aaron Davidson] Rebased changes for new spark-shell a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
1 parent 2a2ca48 commit 0307db0

File tree

7 files changed

+25
-12
lines changed

7 files changed

+25
-12
lines changed

bin/spark-shell

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ set -o posix
3434
FWDIR="$(cd `dirname $0`/..; pwd)"
3535

3636
SPARK_REPL_OPTS="${SPARK_REPL_OPTS:-""}"
37-
DEFAULT_MASTER="local"
37+
DEFAULT_MASTER="local[*]"
3838
MASTER=${MASTER:-""}
3939

4040
info_log=0
@@ -64,7 +64,7 @@ ${txtbld}OPTIONS${txtrst}:
6464
is followed by m for megabytes or g for gigabytes, e.g. "1g".
6565
-dm --driver-memory : The memory used by the Spark Shell, the number is followed
6666
by m for megabytes or g for gigabytes, e.g. "1g".
67-
-m --master : A full string that describes the Spark Master, defaults to "local"
67+
-m --master : A full string that describes the Spark Master, defaults to "local[*]"
6868
e.g. "spark://localhost:7077".
6969
--log-conf : Enables logging of the supplied SparkConf as INFO at start of the
7070
Spark Context.

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1285,8 +1285,8 @@ object SparkContext extends Logging {
12851285

12861286
/** Creates a task scheduler based on a given master URL. Extracted for testing. */
12871287
private def createTaskScheduler(sc: SparkContext, master: String): TaskScheduler = {
1288-
// Regular expression used for local[N] master format
1289-
val LOCAL_N_REGEX = """local\[([0-9]+)\]""".r
1288+
// Regular expression used for local[N] and local[*] master formats
1289+
val LOCAL_N_REGEX = """local\[([0-9\*]+)\]""".r
12901290
// Regular expression for local[N, maxRetries], used in tests with failing tasks
12911291
val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+)\s*,\s*([0-9]+)\]""".r
12921292
// Regular expression for simulating a Spark cluster of [N, cores, memory] locally
@@ -1309,8 +1309,11 @@ object SparkContext extends Logging {
13091309
scheduler
13101310

13111311
case LOCAL_N_REGEX(threads) =>
1312+
def localCpuCount = Runtime.getRuntime.availableProcessors()
1313+
// local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
1314+
val threadCount = if (threads == "*") localCpuCount else threads.toInt
13121315
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
1313-
val backend = new LocalBackend(scheduler, threads.toInt)
1316+
val backend = new LocalBackend(scheduler, threadCount)
13141317
scheduler.initialize(backend)
13151318
scheduler
13161319

core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ class SparkContextSchedulerCreationSuite
5151
}
5252
}
5353

54+
test("local-*") {
55+
val sched = createTaskScheduler("local[*]")
56+
sched.backend match {
57+
case s: LocalBackend => assert(s.totalCores === Runtime.getRuntime.availableProcessors())
58+
case _ => fail()
59+
}
60+
}
61+
5462
test("local-n") {
5563
val sched = createTaskScheduler("local[5]")
5664
assert(sched.maxTaskFailures === 1)

docs/python-programming-guide.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,15 +82,16 @@ The Python shell can be used explore data interactively and is a simple way to l
8282
>>> help(pyspark) # Show all pyspark functions
8383
{% endhighlight %}
8484

85-
By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on a single core.
86-
To connect to a non-local cluster, or use multiple cores, set the `MASTER` environment variable.
85+
By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on all of
86+
your machine's logical cores.
87+
To connect to a non-local cluster, or to specify a number of cores, set the `MASTER` environment variable.
8788
For example, to use the `bin/pyspark` shell with a [standalone Spark cluster](spark-standalone.html):
8889

8990
{% highlight bash %}
9091
$ MASTER=spark://IP:PORT ./bin/pyspark
9192
{% endhighlight %}
9293

93-
Or, to use four cores on the local machine:
94+
Or, to use exactly four cores on the local machine:
9495

9596
{% highlight bash %}
9697
$ MASTER=local[4] ./bin/pyspark

docs/scala-programming-guide.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ object for more advanced configuration.
5454

5555
The `master` parameter is a string specifying a [Spark or Mesos cluster URL](#master-urls) to connect to, or a special "local" string to run in local mode, as described below. `appName` is a name for your application, which will be shown in the cluster web UI. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.
5656

57-
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `bin/spark-shell` on four cores, use
57+
In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable, and you can add JARs to the classpath with the `ADD_JARS` variable. For example, to run `bin/spark-shell` on exactly four cores, use
5858

5959
{% highlight bash %}
6060
$ MASTER=local[4] ./bin/spark-shell
@@ -74,6 +74,7 @@ The master URL passed to Spark can be in one of the following formats:
7474
<tr><th>Master URL</th><th>Meaning</th></tr>
7575
<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
7676
<tr><td> local[K] </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
77+
<tr><td> local[*] </td><td> Run Spark locally with as many worker threads as logical cores on your machine.</td></tr>
7778
</td></tr>
7879
<tr><td> spark://HOST:PORT </td><td> Connect to the given <a href="spark-standalone.html">Spark standalone
7980
cluster</a> master. The port must be whichever one your master is configured to use, which is 7077 by default.
@@ -84,7 +85,7 @@ The master URL passed to Spark can be in one of the following formats:
8485
</td></tr>
8586
</table>
8687

87-
If no master URL is specified, the spark shell defaults to "local".
88+
If no master URL is specified, the spark shell defaults to "local[*]".
8889

8990
For running on YARN, Spark launches an instance of the standalone deploy cluster within YARN; see [running on YARN](running-on-yarn.html) for details.
9091

python/pyspark/shell.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
# this is the equivalent of ADD_JARS
3030
add_files = os.environ.get("ADD_FILES").split(',') if os.environ.get("ADD_FILES") != None else None
3131

32-
sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files)
32+
sc = SparkContext(os.environ.get("MASTER", "local[*]"), "PySparkShell", pyFiles=add_files)
3333

3434
print """Welcome to
3535
____ __

repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -963,7 +963,7 @@ class SparkILoop(in0: Option[BufferedReader], protected val out: JPrintWriter,
963963
case Some(m) => m
964964
case None => {
965965
val prop = System.getenv("MASTER")
966-
if (prop != null) prop else "local"
966+
if (prop != null) prop else "local[*]"
967967
}
968968
}
969969
master

0 commit comments

Comments
 (0)