Skip to content

Conversation

@jerryshao
Copy link
Contributor

This PR is based on the work of @roji to support running Spark scripts from symlinks. Thanks for the great work @roji . Would you mind taking a look at this PR, thanks a lot.

For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in PATH, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.

Instead of using readlink -f like what this PR (#2386) implemented is that -f is not support for Mac, so here manually seeking the path through loop.

I've tested with Mac and Linux (Cent OS), looks fine.

This PR did not fix the scripts under sbin folder, not sure if it needs to be fixed also?

Please help to review, any comment is greatly appreciated.

bin/beeline Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is there no way to refactor these and source the function once? would be a good excuse to create such a common script here if it doesn't exist already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we move this function to a new script, there still has a problem of how to find the real path of this new script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue to think a way to remove these annoying duplication, also do more corner tests about this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I would imagine it could be next to the symlink and symlinked as well... but is the point that even sourcing isn't going to follow the symlink? I thought the point was not that the symlinks don't work as links to files, but that they don't help you find where Spark is installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @srowen , yes the actual problem is to find to out where Spark is installed through symlink.

Yes, sourcing can follow the symlink, but the problem is how to find the path of sourcing file/symlink.

Say if we make spark-shell as a symlink and add to /usr/bin, when we execute this spark-shell in any directory, how to identify the path of sourcing file? If sourcing file is not linked to /usr/bin/xx, we still need to find out where Spark is installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the file would be in /usr/bin too, yes. I understand it's a little ugly, yeah.
Hm, I wonder if this is really the right approach to finding where Spark is installed. Shouldn't this be covered by SPARK_HOME? that's how other software works. The title of the JIRA is actually misleading since symlinks works fine, themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume SPARK_HOME must be set, then this problem can be simply solved. But the problem is that we don't have such assumptions, so we still need some scripts to find out where Spark is installed and still the problem is back again.

BTW, project like Hadoop also has this issue to find out hadoop-config.sh.

@srowen
Copy link
Member

srowen commented Sep 9, 2015

Other than that if it doesn't change existing behavior and lets symlinks stand a chance of working, seems good.

@SparkQA
Copy link

SparkQA commented Sep 9, 2015

Test build #42204 has finished for PR 8669 at commit 69bde5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@patrungel
Copy link

Applying the same fix to sbin executables would be great. In fact, the whole fix would be incomplete without it.

@jerryshao
Copy link
Contributor Author

Hi @patrungel , I've also fixed this issue for sbin executables, mind taking a look at it?

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42245 has finished for PR 8669 at commit d6183b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@patrungel
Copy link

Hi, @jerryshao
I did a quick test with alternatives (against sbin/start-history-server.sh); the fix works to me as expected. Thanks tons.

@aniro
Copy link

aniro commented Oct 8, 2015

Can the following be fixed as well?

--- a/bin/load-spark-env.sh    2015-10-08 14:51:20.874478368 +0000
+++ b/bin/load-spark-env.sh    2015-10-08 15:04:02.109674611 +0000
@@ -47,10 +47,7 @@
 if [ -z "$SPARK_ENV_LOADED" ]; then
   export SPARK_ENV_LOADED=1

-  # Returns the parent of the directory this script lives in.
-  parent_dir="$(cd "`dirname "$0"`"/..; pwd)"
-
-  user_conf_dir="${SPARK_CONF_DIR:-"$parent_dir"/conf}"
+  user_conf_dir="${SPARK_CONF_DIR:-"$FWDIR"/conf}"

   if [ -f "${user_conf_dir}/spark-env.sh" ]; then
     # Promote all variable declarations to environment (exported) variables

@JoshRosen
Copy link
Contributor

Woah, this PR is really big! @jerryshao, do you think that you could move this function into a common "helper library" file then source / include that file into the other scripts? That would help to reduce duplication here.

@srowen
Copy link
Member

srowen commented Oct 17, 2015

@JoshRosen yes agree. See the discussion above out the small chicken-and-egg problem there about the location of the helper script. Still may be a better solution than this much duplication, or else, declare that SPARK_HOME must be set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be replaced by:

DIR="$(dirname "$(readlink -f "$0")")"

I see you're recursively resolving the symlink and putting a limit in the depth of recursion, but it sounds like readlink -f does just that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I just saw the comment about Macs not supporting that. There's no alternative on Macs? (If no, that sucks.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vanzin , the behavior of readlink -f is different under Mac, here is the discussion in stackoverflow. Another solution is to install greadlink through brew.

@srowen
Copy link
Member

srowen commented Oct 22, 2015

I think this is a whole lot of change, and the purpose really is just to avoid setting SPARK_HOME when Spark is symlinked. But that's pretty standard procedure. RIght? I think we should resolve this as wont-fix.

@patrungel
Copy link

The change aims to make spark binaries launch properly when symlinked, not just to avoid setting SPARK_HOME.
As of current master, SPARK_HOME value is overwritten in launchers by a

export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"

which is generically wrong and does not work for a symlinked set-up.

This behaviour prohibits proper ( = with alternatives) installations/packaging.

So, if 'resolve this as wont-fix' refers to the issue, I beg to differ (this discussion should rather be moved to the ticket then).
If it refers just to the change suggested, I don't protest it.

@srowen
Copy link
Member

srowen commented Oct 22, 2015

Gotcha, is the way forward really to not over-write SPARK_HOME if set? Because most of the work here seems to be about discovering the home otherwise. The symlinks themselves are fine, just not quite suitable for discovering the real location of the target.

@jerryshao
Copy link
Contributor Author

Hi @srowen , I admitted currently solution is quite ugly, but like what you mentioned, it is a chicken-and-egg problem, hard to remove the duplication. Another solution is by setting SPARK_HOME to manually support symlink, but it still needs to change the current executables, currently we don't honor SPARK_HOME. Take beeline as a example, even SPARK_HOME is set, still we could not execute from symlink.

So if setting SPARK_HOME as a compromise solution agreed by you guys, I could update the codes.

What's your opinion?

@jerryshao
Copy link
Contributor Author

If SPARK_HOME is the standard way, we should update the scripts to honor this, it's not a good idea to make this issue as won't fix, because lots of deployment and package actually require symlink.

roji and others added 5 commits October 26, 2015 15:34
The current scripts (e.g. pyspark) fail to run when they are
executed via symlinks. A common Linux scenario would be to have Spark
installed somewhere (e.g. /opt) and have a symlink to it in /usr/bin.

Fixed the scripts to traverse symlinks until reaching the actual binary.
@jerryshao
Copy link
Contributor Author

Hi @srowen @patrungel @JoshRosen and @vazin, I changed this patch to honor SPARK_HOME to solve symlink issue, if users want to run the executables from symlink, they should set SPARK_HOME manually. This instead removes lots of duplications, still has the ability to run on symlinks, please help to review, any comment is greatly appreciated.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44400 has finished for PR 8669 at commit c796397.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * \"$\n * \"$\n * exec \"$\n * exec \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * \"$\n

bin/beeline Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I very much like how this has standardized everything to use SPARK_HOME instead of FWDIR, and doesn't overwrite the value if already set. (Nit: all of the occurrences of this line have a 4-space indent instead of 2)

LGTM; does anyone see a reason this isn't a good idea? I suppose now SPARK_HOME, if set, has an effect everywhere, but it looks like the desired effect. Docs also make reference to SPARK_HOME as if it has this effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srowen for your comments, I will change to the 2-space indent.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44408 has finished for PR 8669 at commit 41511b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * \"$\n * \"$\n * exec \"$\n * exec \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * \"$\n

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44423 has finished for PR 8669 at commit 28fd58d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * \"$\n * \"$\n * exec \"$\n * exec \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * \"$\n

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44988 has finished for PR 8669 at commit 16ea884.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * \"$\n * \"$\n * exec \"$\n * exec \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * nohup nice -n \"$SPARK_NICENESS\" \"$\n * \"$\n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble parsing what this script was doing before, but the desired outcome seems clear. The only change here is that SPARK_PREFIX is no longer set and exported, but I believe you've correctly removed usages of it in favor of SPARK_HOME. Good cleanup and standardization IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bunch of code is try to solve symlink, relative path of this file, since we now honor the SPARK_HOME to solve this issue, from my understanding the original code is not necessary.

@srowen
Copy link
Member

srowen commented Nov 4, 2015

LGTM. There's a lot going on here but it's all really standardizing on one approach to defining the Spark base directory. I scanned all the lines and it looks correct and carefully applied, so I feel good about merging this to master.

@asfgit asfgit closed this in 8aff36e Nov 4, 2015
@marmbrus
Copy link
Contributor

It has been pointed out that this commit might be the cause of a regression that is possibly holding up the 1.6 release (https://issues.apache.org/jira/browse/SPARK-12345). It would be great if those involved could take a look.

@jerryshao
Copy link
Contributor Author

Thanks @marmbrus for your notice, I will take a look at the issues.

@srowen
Copy link
Member

srowen commented Dec 16, 2015

It's a good lead; I'm looking too. The effect of this change should be to respect SPARK_HOME if it's set to find Spark scripts and files. It's not clear why this would cause a remote machine to use the home dir set on the driver, unless something is shipping the SPARK_HOME value to the remote machine. I'll comment on the JIRA.

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

The MesosDispatchScheduler ships a number of environment variables.

@jerryshao
Copy link
Contributor Author

@dragos Did you specify SPARK_HOME manually in your environment? By default no one will specify this SPARK_HOME.

@srowen
Copy link
Member

srowen commented Dec 16, 2015

@jerryshao though the scripts will now set SPARK_HOME for you if not already set.
Should the Mesos code ship this variable across machines though? I'd have thought it's properly local to a machine only.

If this is right, one of these two things could be a way forward: don't export SPARK_HOME or don't send it.

@jerryshao
Copy link
Contributor Author

Ohh, I see. That possibly will be the problem. One fix is to remove export in all the scripts, only using SPARK_HOME as a shell variable.

@jerryshao
Copy link
Contributor Author

@dragos , if spark.mesos.executor.home is specified to the directory where Spark is installed in the slave node, does your problem still exist? Also from my understanding, if it is due to SPARK_HOME inconsistency between driver and executor, client mode also should have such problem, is it right?

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

I'll check that. I don't have Spark installed on every slave node in my test setup.

@jerryshao
Copy link
Contributor Author

One more question @dragos , did you specify spark.executor.uri or SPARK_EXECUTOR_URI?

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

I believe I tried both..

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

@jerryshao I observed the same failure using spark.mesos.executor.home.

EDIT: That's probably because spark.mesos.executor.home is not shipped by the Mesos dispatcher, so it falls back to SPARK_HOME

@jerryshao
Copy link
Contributor Author

I see, can you please confirm Spark is installed in this path spark.mesos.executor.home?

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

@jerryshao Yes, it is.

@jerryshao
Copy link
Contributor Author

It is a little weird. I assume if spark.mesos.executor.home is correctly set, from the code Mesos should be able to find the right path of executables:

    val executorSparkHome = conf.getOption("spark.mesos.executor.home")
      .orElse(sc.getSparkHome())
      .getOrElse {
        throw new SparkException("Executor Spark home `spark.mesos.executor.home` is not set!")
      }
      val runScript = new File(executorSparkHome, "./bin/spark-class").getCanonicalPath
      command.setValue(
        "%s \"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend"
          .format(prefixEnv, runScript) +
        s" --driver-url $driverURL" +
        s" --executor-id ${offer.getSlaveId.getValue}" +
        s" --hostname ${offer.getHostname}" +
        s" --cores $numCores" +
        s" --app-id $appId")

@jerryshao
Copy link
Contributor Author

Anyway I think we should not export SPARK_HOME in all the scripts, what do you think @srowen ?

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

I admit it is very surprising. Here's what I see

Driver page on the Mesos dispatcher (note that spark.executor.uri is set, but also SPARK_HOME -- which is wrong, since it's an absolute path on my machine):

screen shot 2015-12-16 at 10 22 26

And here is the log. Note that the executor is fetched correctly, but the submit script spark-class is calculated relative to SPARK_HOME (inside spark-submit!)

I1216 09:18:44.974624   464 fetcher.cpp:369] Fetching URI '/var/spark/spark-1.6.0-bin-hadoop2.6.tgz'
I1216 09:18:44.974628   464 fetcher.cpp:243] Fetching directly into the sandbox directory
I1216 09:18:44.974639   464 fetcher.cpp:180] Fetching URI '/var/spark/spark-1.6.0-bin-hadoop2.6.tgz'
I1216 09:18:44.974652   464 fetcher.cpp:160] Copying resource with command:cp '/var/spark/spark-1.6.0-bin-hadoop2.6.tgz' '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816/spark-1.6.0-bin-hadoop2.6.tgz'
I1216 09:18:46.351866   464 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816' -xf '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816/spark-1.6.0-bin-hadoop2.6.tgz'
I1216 09:18:48.625231   464 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816/spark-1.6.0-bin-hadoop2.6.tgz' into '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816'
I1216 09:18:48.625273   464 fetcher.cpp:446] Fetched '/var/spark/spark-1.6.0-bin-hadoop2.6.tgz' to '/tmp/mesos/slaves/ac7d8645-030b-4c99-a47f-2e02379d8c41-S1/frameworks/ac7d8645-030b-4c99-a47f-2e02379d8c41-0002/executors/driver-20151216101839-0001/runs/ab128426-95fa-4a3e-82f6-9c5b4baba816/spark-1.6.0-bin-hadoop2.6.tgz'
I1216 09:18:48.666476   539 logging.cpp:172] INFO level logging started!
I1216 09:18:48.667392   539 exec.cpp:134] Version: 0.25.0
I1216 09:18:48.669401   545 exec.cpp:208] Executor registered on slave ac7d8645-030b-4c99-a47f-2e02379d8c41-S1
bin/spark-submit: line 27: /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: No such file or directory

@skonto
Copy link
Contributor

skonto commented Dec 16, 2015

If you set executorUri.isDefined then it wont reach (this is what we have as well):

val executorSparkHome = conf.getOption("spark.mesos.executor.home")
.orElse(sc.getSparkHome())
.getOrElse {
throw new SparkException("Executor Spark home spark.mesos.executor.home is not set!")
}

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

@jerryshao
Copy link
Contributor Author

Yes, agree with @skonto. Why still honor SPARK_HOME, a little strange.

@skonto
Copy link
Contributor

skonto commented Dec 16, 2015

Its because Mesos Dispatcher passed that env variable to the driver at line:
builder.setEnvironment(envBuilder.build())

In 1.6 the spark-submit script has changed...
https://github.com/apache/spark/blob/master/bin/spark-submit
This line make SPARK_HOME not to change to the local path on mesos slave because it has already been set.
if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "dirname "$0""/..; pwd)"
fi
And the driver is started with spark-submit command anyway...

@jerryshao
Copy link
Contributor Author

Ahhh, I see. The problem is that we expose SPARK_HOME in all the shell scripts.

@srowen
Copy link
Member

srowen commented Dec 16, 2015

I'm not sure it's that. It's that SPARK_HOME is copied by the mesos infrastructure to other machines. I think this in inherently local. Still, not-exporting SPARK_HOME would be a quick way to test whether this theory is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.