Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -635,7 +635,12 @@ private[spark] class Client(
distribute(args.primaryPyFile, appMasterOnly = true)
}

pySparkArchives.foreach { f => distribute(f) }
pySparkArchives.foreach { f =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work when Spark is not installed in other nodes? IIRC, we can run the application in the cluster where Spark is not installed because the jars are shipped together in Yarn cluster.

Likewise, PySpark was able to run. From my very cursory look, it's going to break this case because it will not distribute the local pyspark archive anymore. Can you confirm this @shanyu and @tgravescs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the case someone explicitly put local: on the url so its expected to be on every machine. YARN distributes everything that is file: or downloads it if its hdfs:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for clarification. LGTM

val uri = Utils.resolveURI(f)
if (uri.getScheme != Utils.LOCAL_SCHEME) {
distribute(f)
}
}

// The python files list needs to be treated especially. All files that are not an
// archive need to be placed in a subdirectory that will be added to PYTHONPATH.
Expand Down