Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Mar 31, 2020

What changes were proposed in this pull request?

This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications.

Why are the changes needed?

For now, we add pyFiles argument to files argument only for Python applications, in SparkSubmit. Like the reason in #21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add pyFiles to files even it is not Python applications.

Does this PR introduce any user-facing change?

Yes. After this change, for non-PySpark applications, the Python files specified by pyFiles are also added to files like PySpark applications.

How was this patch tested?

Manually test on jupyter notebook or do spark-submit with --verbose.

Spark config:
...
(spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py)
(spark.submit.deployMode,client)
(spark.master,local[*])

@viirya
Copy link
Member Author

viirya commented Mar 31, 2020

Maybe we need a unit test too. But let me wait for some comments first.

@HyukjinKwon
Copy link
Member

I think it's fine. cc @vanzin and @jerryshao

@dongjoon-hyun
Copy link
Member

Hi, @viirya . This PR title looks too broad. Could you be more specific by excluding the scope of SPARK-24377 ?

[SPARK-24377][Spark Submit] make --py-files work in non pyspark application

@SparkQA
Copy link

SparkQA commented Mar 31, 2020

Test build #120622 has finished for PR 28077 at commit a892907.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-31308][PySpark] Make Python dependencies available for Non-PySpark applications [SPARK-31308][PySpark] Merging pyFiles to files argument for Non-PySpark applications Mar 31, 2020
@SparkQA
Copy link

SparkQA commented Mar 31, 2020

Test build #120628 has finished for PR 28077 at commit 8b5edef.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 31, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Mar 31, 2020

Test build #120631 has finished for PR 28077 at commit 8b5edef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @viirya and @HyukjinKwon .

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ark applications

### What changes were proposed in this pull request?

This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications.

### Why are the changes needed?

For now, we add `pyFiles` argument to `files` argument only for Python applications, in SparkSubmit. Like the reason in apache#21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add `pyFiles` to `files` even it is not Python applications.

### Does this PR introduce any user-facing change?

Yes. After this change, for non-PySpark applications, the Python files specified by `pyFiles` are also added to `files` like PySpark applications.

### How was this patch tested?

Manually test on jupyter notebook or do `spark-submit` with `--verbose`.

```
Spark config:
...
(spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py)
(spark.submit.deployMode,client)
(spark.master,local[*])
```

Closes apache#28077 from viirya/pyfile.

Lead-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@viirya viirya deleted the pyfile branch December 27, 2023 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants