Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Apr 4, 2014

[SPARK-1403] I investigated why spark 0.9.0 loads fine on mesos while spark 1.0.0 fails. What I found was that in SparkEnv.scala, while creating the SparkEnv object, the current thread's classloader is null. But in 0.9.0, at the same place, it is set to org.apache.spark.repl.ExecutorClassLoader . I saw that 7edbea4 moved it to it current place. I moved it back and saw that 1.0.0 started working fine on mesos.

I just created a minimal patch that allows me to run spark on mesos correctly. It seems like SecurityManager's creation needs to be taken into account for a correct fix. Also moving the creation of the serializer out of SparkEnv might be a part of the right solution. PTAL.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ghost
Copy link
Author

ghost commented Apr 4, 2014

After looking at the code a bit more, I see that the code to setContextClassLoader does not use SecurityManager AFAIS. createClassLoader is creating a File object. addReplClassLoaderIfNeeded is dynamically loading a class file. I dont see any use of SecurityManager in these two methods. I am not sure if it gets used implicitly.

@pwendell
Copy link
Contributor

pwendell commented Apr 4, 2014

This patch also adds back a line that we earlier removed:

 Thread.currentThread.setContextClassLoader(replClassLoader)

Does your fix require that line to work? If so, what is the error if you remove it?

@pwendell
Copy link
Contributor

pwendell commented Apr 4, 2014

@ueshin removed it in SPARK-1210:
#15

I proposed a more conservative change in this comment:
#15 (comment)

But we ultimately went ahead and just removed it because we couldn't see a case where it was necessary...

@ghost
Copy link
Author

ghost commented Apr 4, 2014

I am testing out if removing the last line is ok.

@ghost
Copy link
Author

ghost commented Apr 4, 2014

I can confirm that the third line is needed. Without that line I see the same failure as earlier.

@pwendell
Copy link
Contributor

pwendell commented Apr 4, 2014

Which exact failure is it? Could you post the stack trace?

@ueshin
Copy link
Member

ueshin commented Apr 4, 2014

Hi,
I think MesosExecutorBackend should have own ContextClassLoader and it should be set to the class loader of MesosExecutorBackend class before it creates Executor object.

I mistakenly thought that MesosExecutorBackend has own context class loader, but it doesn't. While creating SparkEnv in the Executor's constructor, Thread.currentThread.getContextClassLoader returns null because MesosExecutorBackend doesn't have context class loader. Class.forName() method with null for 3rd argument tries to load class from bootstrap class loader, which doesn't know the class org.apache.spark.serializer.JavaSerializer.

@pwendell
Copy link
Contributor

pwendell commented Apr 4, 2014

@ueshin - ah cool. Do you mind giving a code snippet of what that would look like? Then @manku-timma can see if it fixes the problem. Probably is a better solution...

@ghost
Copy link
Author

ghost commented Apr 4, 2014

The actual error is below. Line numbers might be a bit off due to some of my code.

java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:249)
        at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:173)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:184)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:115)
        at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)
Exception in thread "Thread-0"

@ghost
Copy link
Author

ghost commented Apr 4, 2014

From my limited understanding, Executor is used by the mesos backend and the standalone backend. If mesos backend has its own class loader, will that be enough? Does the standalone backend already have its own class loader?

@ueshin
Copy link
Member

ueshin commented Apr 4, 2014

Put the following line at the beginning of registered method (at line 53)

Thread.currentThread.setContextClassLoader(getClass.getClassLoader)

@ueshin
Copy link
Member

ueshin commented Apr 4, 2014

@manku-timma Yes, the standalone backend org.apache.spark.executor.CoarseGrainedExecutorBackend and the local-mode backend org.apache.spark.scheduler.local.LocalActor have their own context class loader set by ActorSystem.

@ueshin
Copy link
Member

ueshin commented Apr 4, 2014

BTW, we might have to restore the context class loader of MesosExecutorBackend to bootstrap class loader.
We should restore if there are 2 or more threads to handle MesosExecutorBackend, but I don't know how many threads there are.

@ghost
Copy link
Author

ghost commented Apr 4, 2014

@ueshin, your one line change works for me.

Let me know if you want me to test any of the restore code.

Just for my understanding, the flow after your one-line fix:

  • MesosExecutorBackend starts off
  • MesosExecutorBackend sets classloader
  • MesosExecutorBackend creates Executor
  • Executor creates SparkEnv
  • Exectutor sets classloader to the http based one (so that later classes can be downloaded from the master)

@ueshin
Copy link
Member

ueshin commented Apr 4, 2014

@manku-timma, I just wondered that we should restore the class loader in registered method like:

val cl = Thread.currentThread.getContextClassLoader
try {
  Thread.currentThread.setContextClassLoader(getClass.getClassLoader)
  logInfo("Registered with Mesos as executor ID " + executorInfo.getExecutorId.getValue)
  this.driver = driver
  val properties = Utils.deserialize[Array[(String, String)]](executorInfo.getData.toByteArray)
  executor = new Executor(
    executorInfo.getExecutorId.getValue,
    slaveInfo.getHostname,
    properties)
} finally {
  Thread.currentThread.setContextClassLoader(cl)
}

I think this is better than before.

The http based class loader for TaskRunner is correctly set at this point.

The flow you wrote would be:

  • when registered method is called
    • MesosExecutorBackend starts off
    • MesosExecutorBackend sets classloader
    • MesosExecutorBackend creates Executor
    • Executor creates SparkEnv
    • MesosExecutorBackend restore classlaoder
  • when launchTask method is called
    • MesosExecutorBackend calls Executor.launchTask method
    • Executor creates TaskRunner
    • Executor submits the task runner
  • when the task runner is run
    • TaskRunner sets classloader to the http based one (so that later classes can be downloaded from the master)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need one more line here

Thread.currentThread.setContextClassLoader(getClass.getClassLoader)

@ghost
Copy link
Author

ghost commented Apr 5, 2014

Oops. Added that line.

I am facing this error in the current git tree

[error] /home/vagrant/spark2/sql/core/src/main/scala/org/apache/spark/sql/parque
t/ParquetRelation.scala:106: value getGlobal is not a member of object java.util
.logging.Logger
[error]       logger.setParent(Logger.getGlobal)
[error]                               ^
[error] one error found

@ghost
Copy link
Author

ghost commented Apr 5, 2014

Let me know if there is any other change I need to make. I have tested after merging from master and things look fine. This is good to be merged from my end.

@ueshin
Copy link
Member

ueshin commented Apr 5, 2014

LGTM about the class loader issue.
But I'm not sure the last fix i.e. Java7 API may be used or not.

@ghost
Copy link
Author

ghost commented Apr 6, 2014

I see that PR 334 made the java 6 change. So I reverted mine.

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

Hey @ueshin @manku-timma - as a simpler fix, would it be possible to just change the way SparkEnv captures the class loader? I think it was probably just an oversight when that was added. If there is not a context class loader, then it should just load the bootstrap class loader:

val classLoader = Option(Thread.currentThread.getContextClassLoader).getOrElse(getClass.getClassLoader)

@ghost
Copy link
Author

ghost commented Apr 6, 2014

@pwendell: I tested your fix to SparkEnv.scala (after reverting my earlier change). It does not work. SparkEnv's loader turns out to be sun.misc.Launcher$AppClassLoader@12360be0 and it fails with the following error.

java.lang.ClassNotFoundException: org/apache/spark/io/LZFCompressionCodec
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:249)
        at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:45)
        at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:41)
        at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:110)
        at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:71)
        at org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:82)
        at org.apache.spark.broadcast.BroadcastManager.<init>(Broadcast.scala:69)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:195)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:105)
        at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56)

My patch to SparkEnv.scala:

@@ -142,7 +142,10 @@ object SparkEnv extends Logging {
       conf.set("spark.driver.port",  boundPort.toString)
     }

-    val classLoader = Thread.currentThread.getContextClassLoader
+    val classLoader = Option(Thread.currentThread.getContextClassLoader)
+      .getOrElse(getClass.getClassLoader)

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

I guess what I don't understand about the current fix is, where is the context class loader getting changed from the boostrap class loader in the first place? The current approach is to capture the class loader of MesosExecutorDriver and set it to the context class loader and then set it back. Isn't this also just the sun.misc.Launcher class loader?

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

Another way of putting my question. Currently there is a line:

Thread.currentThread.setContextClassLoader(getClass.getClassLoader)

What is the actual classloader being set here... isn't it just the sun.misc.Launcher one?

@ghost
Copy link
Author

ghost commented Apr 7, 2014

@pwendell: You are right. Actually sun.misc.Launcher$AppClassLoader@12360be0 is the classloader even in the earlier code.

Looks like classes directly loaded by SparkEnv get to use the right classloader since it is passed as an argument. But classes loaded at the second level (like in BroadcastManager above) still use the null classloader and fail.

@pwendell
Copy link
Contributor

pwendell commented Apr 7, 2014

@ueshin you said before that "Class.forName() method with null for 3rd argument tries to load class from bootstrap class loader, which doesn't know the class org.apache.spark.serializer.JavaSerializer."

But I think in this case we'd expect the bootstrap classloader to know about JavaSerializer (this should be on the classpath when the executor starts), right? I'm still not sure why it would fail in this case. I don't see why MesosExecutorDriver could be on the java classpath but JavaSerializer isn't.

@manku-timma I looked more and the reason this doesn't work is that it looks like other parts of the code don't directly use the classLoader form the executor. I can look more tomorrow and see how we can best clean this up. The current approach works but it's a bit of a hack. There might be a nicer way to clean this up.

@pwendell
Copy link
Contributor

pwendell commented Apr 7, 2014

Ah I see, @ueshin you are right. It's the bootstrap class loader and it won't have any spark definitions. I was mixing this up with the system class loader.

./bin/spark-shell
scala> Class.forName("org.apache.spark.serializer.JavaSerializer")
res7: Class[_] = class org.apache.spark.serializer.JavaSerializer

scala> Class.forName("org.apache.spark.serializer.JavaSerializer", true, null)
java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:11)
    at $iwC$$iwC$$iwC.<init>(<console>:16)
    at $iwC$$iwC.<init>(<console>:18)
    at $iwC.<init>(<console>:20)

We should definitely clean this up. The behavior we want in every case is to either use the context class loader (if present) and if not use the classloader that loads spark classes (e.g. the system classloader).

@ueshin
Copy link
Member

ueshin commented Apr 7, 2014

@pwendell Yes, the bootstrap class loader knows only core Java APIs and the Spark classes (specified by -cp java command argument) are loaded by the system class loader (which would be sun.misc.Launcher$AppClassLoader with Oracle JDK).

@ghost
Copy link
Author

ghost commented Apr 7, 2014

So the current fix looks fine?

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Apr 7, 2014
SPARK-1009 Updated MLlib docs to show how to use it in Python

In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
@ghost
Copy link
Author

ghost commented Apr 9, 2014

@pwendell: Do you plan to pick this up for 1.0? Is there anything more I need to do?

@pwendell
Copy link
Contributor

Jenkins, retest this please.

I'm fine to have this is a workaround for 1.0. I'll make a JIRA describing the broader issue and we can fix it for 1.1.

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14078/

@pwendell
Copy link
Contributor

Thanks for the fix - merged into master and 1.0

@asfgit asfgit closed this in ca11919 Apr 13, 2014
asfgit pushed a commit that referenced this pull request Apr 13, 2014
….9.0

[SPARK-1403] I investigated why spark 0.9.0 loads fine on mesos while spark 1.0.0 fails. What I found was that in SparkEnv.scala, while creating the SparkEnv object, the current thread's classloader is null. But in 0.9.0, at the same place, it is set to org.apache.spark.repl.ExecutorClassLoader . I saw that 7edbea4 moved it to it current place. I moved it back and saw that 1.0.0 started working fine on mesos.

I just created a minimal patch that allows me to run spark on mesos correctly. It seems like SecurityManager's creation needs to be taken into account for a correct fix. Also moving the creation of the serializer out of SparkEnv might be a part of the right solution. PTAL.

Author: Bharath Bhushan <[email protected]>

Closes #322 from manku-timma/spark-1403 and squashes the following commits:

606c2b9 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
ec8f870 [Bharath Bhushan] revert the logger change for java 6 compatibility as PR 334 is doing it
728beca [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
044027d [Bharath Bhushan] fix compile error
6f260a4 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
b3a053f [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
04b9662 [Bharath Bhushan] add missing line
4803c19 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
f3c9a14 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
42d3d6a [Bharath Bhushan] used code fragment from @ueshin to fix the problem in a better way
89109d7 [Bharath Bhushan] move the class loader creation back to where it was in 0.9.0
(cherry picked from commit ca11919)

Signed-off-by: Patrick Wendell <[email protected]>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
….9.0

[SPARK-1403] I investigated why spark 0.9.0 loads fine on mesos while spark 1.0.0 fails. What I found was that in SparkEnv.scala, while creating the SparkEnv object, the current thread's classloader is null. But in 0.9.0, at the same place, it is set to org.apache.spark.repl.ExecutorClassLoader . I saw that apache@7edbea4 moved it to it current place. I moved it back and saw that 1.0.0 started working fine on mesos.

I just created a minimal patch that allows me to run spark on mesos correctly. It seems like SecurityManager's creation needs to be taken into account for a correct fix. Also moving the creation of the serializer out of SparkEnv might be a part of the right solution. PTAL.

Author: Bharath Bhushan <[email protected]>

Closes apache#322 from manku-timma/spark-1403 and squashes the following commits:

606c2b9 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
ec8f870 [Bharath Bhushan] revert the logger change for java 6 compatibility as PR 334 is doing it
728beca [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
044027d [Bharath Bhushan] fix compile error
6f260a4 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
b3a053f [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
04b9662 [Bharath Bhushan] add missing line
4803c19 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
f3c9a14 [Bharath Bhushan] Merge remote-tracking branch 'upstream/master' into spark-1403
42d3d6a [Bharath Bhushan] used code fragment from @ueshin to fix the problem in a better way
89109d7 [Bharath Bhushan] move the class loader creation back to where it was in 0.9.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants