Skip to content

Commit 20fc6fa

Browse files
viiryadongjoon-hyun
authored andcommitted
[SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySpark applications
### What changes were proposed in this pull request? This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications. ### Why are the changes needed? For now, we add `pyFiles` argument to `files` argument only for Python applications, in SparkSubmit. Like the reason in #21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add `pyFiles` to `files` even it is not Python applications. ### Does this PR introduce any user-facing change? Yes. After this change, for non-PySpark applications, the Python files specified by `pyFiles` are also added to `files` like PySpark applications. ### How was this patch tested? Manually test on jupyter notebook or do `spark-submit` with `--verbose`. ``` Spark config: ... (spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py) (spark.submit.deployMode,client) (spark.master,local[*]) ``` Closes #28077 from viirya/pyfile. Lead-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 5ec1814 commit 20fc6fa

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -474,10 +474,12 @@ private[spark] class SparkSubmit extends Logging {
474474
args.mainClass = "org.apache.spark.deploy.PythonRunner"
475475
args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
476476
}
477-
if (clusterManager != YARN) {
478-
// The YARN backend handles python files differently, so don't merge the lists.
479-
args.files = mergeFileLists(args.files, args.pyFiles)
480-
}
477+
}
478+
479+
// Non-PySpark applications can need Python dependencies.
480+
if (deployMode == CLIENT && clusterManager != YARN) {
481+
// The YARN backend handles python files differently, so don't merge the lists.
482+
args.files = mergeFileLists(args.files, args.pyFiles)
481483
}
482484

483485
if (localPyFiles != null) {

0 commit comments

Comments
 (0)