-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-40993][SPARK-41705][CONNECT] Move Spark Connect documentation and script to dev/ and Python documentation #39338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1b01072
06ad457
5f20a43
38a5224
c64155f
fd3f4ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,29 +1,28 @@ | ||
| # Spark Connect - Developer Documentation | ||
| # Spark Connect | ||
|
|
||
| **Spark Connect is a strictly experimental feature and under heavy development. | ||
| All APIs should be considered volatile and should not be used in production.** | ||
|
|
||
| This module contains the implementation of Spark Connect which is a logical plan | ||
| facade for the implementation in Spark. Spark Connect is directly integrated into the build | ||
| of Spark. To enable it, you only need to activate the driver plugin for Spark Connect. | ||
| of Spark. | ||
|
|
||
| The documentation linked here is specifically for developers of Spark Connect and not | ||
| directly intended to be end-user documentation. | ||
|
|
||
| ## Development Topics | ||
|
|
||
| ## Getting Started | ||
| ### Guidelines for new clients | ||
|
|
||
| ### Build | ||
| When contributing a new client please be aware that we strive to have a common | ||
| user experience across all languages. Please follow the below guidelines: | ||
|
|
||
| ```bash | ||
| ./build/mvn -Phive clean package | ||
| ``` | ||
| * [Connection string configuration](docs/client-connection-string.md) | ||
| * [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect protocol. | ||
|
|
||
| or | ||
| ### Python client developement | ||
|
|
||
| ```bash | ||
| ./build/sbt -Phive clean package | ||
| ``` | ||
| Python-specific developement guidelines are located in [python/docs/source/development/testing.rst](https://github.com/apache/spark/blob/master/python/docs/source/development/testing.rst) that is published at [Development tab](https://spark.apache.org/docs/latest/api/python/development/index.html) in PySpark documentation. | ||
|
|
||
| ### Build with user-defined `protoc` and `protoc-gen-grpc-java` | ||
|
|
||
|
|
@@ -48,56 +47,3 @@ export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe | |
| The user-defined `protoc` and `protoc-gen-grpc-java` binary files can be produced in the user's compilation environment by source code compilation, | ||
| for compilation steps, please refer to [protobuf](https://github.com/protocolbuffers/protobuf) and [grpc-java](https://github.com/grpc/grpc-java). | ||
|
|
||
|
|
||
| ### Run Spark Shell | ||
|
|
||
| To run Spark Connect you locally built: | ||
|
|
||
| ```bash | ||
| # Scala shell | ||
| ./bin/spark-shell \ | ||
| --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \ | ||
| --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin | ||
|
|
||
| # PySpark shell | ||
| ./bin/pyspark \ | ||
| --jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \ | ||
| --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin | ||
| ``` | ||
|
|
||
| To use the release version of Spark Connect: | ||
|
|
||
| ```bash | ||
| ./bin/spark-shell \ | ||
| --packages org.apache.spark:spark-connect_2.12:3.4.0 \ | ||
| --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin | ||
| ``` | ||
|
|
||
| ### Run Tests | ||
|
|
||
| ```bash | ||
| # Run a single Python class. | ||
| ./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_basic' | ||
| ``` | ||
|
|
||
| ```bash | ||
| # Run all Spark Connect Python tests as a module. | ||
| ./python/run-tests --module pyspark-connect --parallelism 1 | ||
| ``` | ||
|
|
||
|
|
||
| ## Development Topics | ||
|
|
||
| ### Generate proto generated files for the Python client | ||
| 1. Install `buf version 1.11.0`: https://docs.buf.build/installation | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved to Environment Setup |
||
| 2. Run `pip install grpcio==1.48.1 protobuf==3.19.5 mypy-protobuf==3.3.0 googleapis-common-protos==1.56.4 grpcio-status==1.48.1` | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is already documented in https://spark.apache.org/docs/latest/api/python/development/contributing.html#environment-setup |
||
| 3. Run `./connector/connect/dev/generate_protos.sh` | ||
| 4. Optional Check `./dev/check-codegen-python.py` | ||
|
|
||
| ### Guidelines for new clients | ||
|
|
||
| When contributing a new client please be aware that we strive to have a common | ||
| user experience across all languages. Please follow the below guidelines: | ||
|
|
||
| * [Connection string configuration](docs/client-connection-string.md) | ||
| * [Adding new messages](docs/adding-proto-messages.md) in the Spark Connect protocol. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to
python/docs/source/development/testing.rst