-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47683][PYTHON][BUILD] Decouple PySpark core API to pyspark.core package #45053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d21d1e3 to
3716df3
Compare
1c3b61e to
e6cd1b4
Compare
095528b to
9ca2054
Compare
Member
Author
|
cc @zhengruifeng @grundprinzip @ueshin @hvanhovell @itholic @WeichenXu123 @mengxr @allisonwang-db @xinrong-meng @gatorsmile @cloud-fan This is ready for a look (before merging, should wait one more day for the SPIP to pass though) |
zhengruifeng
approved these changes
Apr 3, 2024
itholic
approved these changes
Apr 3, 2024
xinrong-meng
approved these changes
Apr 3, 2024
ueshin
reviewed
Apr 3, 2024
5500bd7 to
4919bea
Compare
Member
Author
|
I restored the references for our internal API. Explicitly private attributes starting |
Member
Author
|
Merged to master. |
HyukjinKwon
added a commit
that referenced
this pull request
May 2, 2024
…spark-connect` package ### What changes were proposed in this pull request? This PR is a followup of #45053 that includes `lib/py4j*zip` in the package. Currently it's being picked up by https://github.com/apache/spark/blob/master/python/MANIFEST.in#L26. For other files, we don't create `deps` directory in `setup.py` for `pyspark-connect` so they are not included. But `lib` is being included. ### Why are the changes needed? To exclude unrelated files. ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually packaged, and checked the contents via `vi`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46331 from HyukjinKwon/SPARK-47683-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to release a separate
pyspark-connectpackage, see also SPIP: Pure Python Package in PyPI (Spark Connect).Today's PySpark package is roughly as follows:
There will be two packages available,
pysparkandpyspark-connect.pysparkSame as today’s PySpark. But Core module is factored out to
pyspark.core.*. User-facing interface stays the same atpyspark.*.pyspark-connectPackage after excluding modules that do not support Spark Connect, also excluding jars, that are, ml without jars:
Why are the changes needed?
To provide a pure Python library that does not depend on JVM.
See also SPIP: Pure Python Package in PyPI (Spark Connect).
Does this PR introduce any user-facing change?
Yes, users can install pure Python library via
pip install pyspark-connect.How was this patch tested?
Manually tested the basic set of tests.
./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`They will be separated added, and set as a scheduled job in CI.
Was this patch authored or co-authored using generative AI tooling?
No.