-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-2058: Overriding config from SPARK_HOME with SPARK_CONF_DIR #997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5ede2c0
96e3fdd
69d337e
99a4341
5357369
d2d1543
186c975
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,7 +47,7 @@ object SparkSubmit { | |
| private val PYSPARK_SHELL = "pyspark-shell" | ||
|
|
||
| def main(args: Array[String]) { | ||
| val appArgs = new SparkSubmitArguments(args) | ||
| val appArgs = new SparkSubmitArguments(args, sys.env) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to pass |
||
| if (appArgs.verbose) { | ||
| printStream.println(appArgs) | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,7 +30,7 @@ import org.apache.spark.util.Utils | |
| /** | ||
| * Parses and encapsulates arguments from the spark-submit script. | ||
| */ | ||
| private[spark] class SparkSubmitArguments(args: Seq[String]) { | ||
| private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to accept an extra argument here. This will always read from the JVM directly anyway.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, you use this for tests. Then it makes sense to add a comment here to explain that. |
||
| var master: String = null | ||
| var deployMode: String = null | ||
| var executorMemory: String = null | ||
|
|
@@ -83,9 +83,12 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { | |
|
|
||
| // Use common defaults file, if not specified by user | ||
| if (propertiesFile == null) { | ||
| sys.env.get("SPARK_HOME").foreach { sparkHome => | ||
| val sep = File.separator | ||
| val defaultPath = s"${sparkHome}${sep}conf${sep}spark-defaults.conf" | ||
| val sep = File.separator | ||
| val sparkHomeConfig = env.get("SPARK_HOME").map(sparkHome => s"${sparkHome}${sep}conf") | ||
|
|
||
| // give preference to user defined conf over the one in spark home | ||
| env.get("SPARK_CONF_DIR").orElse(sparkHomeConfig).foreach { configPath => | ||
| val defaultPath = s"${configPath}${sep}spark-defaults.conf" | ||
| val file = new File(defaultPath) | ||
| if (file.exists()) { | ||
| propertiesFile = file.getAbsolutePath | ||
|
|
@@ -161,7 +164,7 @@ private[spark] class SparkSubmitArguments(args: Seq[String]) { | |
| } | ||
|
|
||
| if (master.startsWith("yarn")) { | ||
| val hasHadoopEnv = sys.env.contains("HADOOP_CONF_DIR") || sys.env.contains("YARN_CONF_DIR") | ||
| val hasHadoopEnv = env.contains("HADOOP_CONF_DIR") || env.contains("YARN_CONF_DIR") | ||
| if (!hasHadoopEnv && !Utils.isTesting) { | ||
| throw new Exception(s"When running with master '$master' " + | ||
| "either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.") | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -839,3 +839,16 @@ compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface. | |
| Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a | ||
| `log4j.properties` file in the `conf` directory. One way to start is to copy the existing | ||
| `log4j.properties.template` located there. | ||
|
|
||
| # Overriding configuration | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about "Overriding configuration directory"? |
||
|
|
||
| In some cases you might want to provide all configuration from another place than the default SPARK_HOME/conf dir. | ||
| For example if you are using the prepackaged version of Spark or if you are building it your self but want to be | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yourself* |
||
| independent from your cluster configuration (managed by an automation tool). | ||
|
|
||
| In that scenario you can define the SPARK_CONF_DIR variable pointing to an alternate directory containing you configuration. | ||
| Spark will then use it for the following configurations: | ||
|
|
||
| * spark-defaults.conf and spark-env.sh will be loaded only from the SPARK_CONF_DIR | ||
| * log4j.properties, fairscheduler.xml and metrics.properties if present will be loaded from SPARK_CONF_DIR, | ||
| but if missing, the ones from SPARK_HOME/conf will be used. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This whole paragraph is a little verbose. I think it's sufficient to say something like To specify a different configuration directory other than the default "SPARK_HOME/conf", you can set SPARK_CONF_DIR. Spark will look for the following configuration files in this directory: |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the configuration dir need to be added to the classpath? The code, at least the part you're modifying below, doesn't seem to require that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it's not used in the SparkSubmit, but this is done in order to provide the user defined config to the other components such as logging, Scheduler and metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i think in standalone mode,worker need to use config. so the classpath is useful.