Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 28, 2023

What changes were proposed in this pull request?

This PR is a sort of followup of #44504 but addresses a separate issue. This PR proposes to check:

  • if Python executable exists when looking up available Python Data Sources.
  • if PySpark source and Py4J files exist - for the case users don't have them in their machine (and don't use PySpark).

Why are the changes needed?

For some OSes such as Windows, or minimized Docker containers, there is no Python installed, and it will just fail even when users want to use Scala only. We should check the Python executable, and skip if that does not exist.

Does this PR introduce any user-facing change?

No because the main change has not been released out yet.

How was this patch tested?

Manually tested.

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon
Copy link
Member Author

cc @ueshin 🙏

@panbingkun
Copy link
Contributor

panbingkun commented Dec 29, 2023

I have reverted #44504 (CommitID: 229a4eaf547e5c263c749bd53f7f9a89f4a9bea9).
Based on the current running results, the Run Spark on Kubernetes Integration test failure of GA is related to this.

https://github.com/apache/spark/pull/44530/files
https://github.com/panbingkun/spark/actions/runs/7353125339/job/20018716583

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 29, 2023

Thanks. Let me fix up here. I know the reason.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 30, 2023

@LuciferYang @dongjoon-hyun @zhengruifeng if anyone is online can you merge this one please ? I will have to be away from keyboard today.. and this technically fixes the build.

@zhengruifeng
Copy link
Contributor

merged to master

@HyukjinKwon
Copy link
Member Author

Thx thx

@LuciferYang
Copy link
Contributor

late LGTM, thanks @HyukjinKwon for fixing this

HyukjinKwon added a commit that referenced this pull request Jan 4, 2024
…file separator to correctly check PySpark library existence

### What changes were proposed in this pull request?

This PR is a followup of #44519 that fixes a mistake of separating the paths. It should use `Files.pathSeparator`.

### Why are the changes needed?

It works with testing mode, but it doesn't work with production mode otherwise.

### Does this PR introduce _any_ user-facing change?

No, because the main change has not been released.

### How was this patch tested?

Manually as described in "How was this patch tested?" at #44504.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44590 from HyukjinKwon/SPARK-46530-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-46530 branch January 15, 2024 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants