-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String #38470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Overall LGTM Is the spark/python/pyspark/sql/connect/client.py Line 107 in 8f6b185
|
|
Good point, I will incorporate that into the doc. |
| @@ -0,0 +1,110 @@ | |||
| # Connecting to Spark Connect using Clients | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage documentation should better be in https://github.com/apache/spark/tree/master/docs as an md file so it can be properly documented.
|
Maybe it's better to have a JIRA. BTW, wonder if we have an e2e example for users can copy and paste to try. (e.g., like most of docs in https://spark.apache.org/docs/latest/index.html). Another decision to make is if we should document it in PySpark docs (https://spark.apache.org/docs/latest/api/python/getting_started/index.html) or Spark main page (https://spark.apache.org/docs/latest/index.html) |
|
@HyukjinKwon I will add a Jira this is just the starting point to align where we want to go. My idea would be that once this is merged I will create a pr for the python client to follow this proposal and then we can update this doc with an e2e example. But the main goal here is developer documentation not user doc. For example, scala must then implement the same model. |
|
For developer documentation, it might better be placed under sources as a comment e.g., Or. maybe README.md at the top level of the compoenent https://github.com/apache/spark/blob/master/hadoop-cloud/README.md or https://github.com/apache/spark/blob/master/connector/docker/README.md |
|
What about we link to it from the top level Readme in the component? The reason why it's not in the code is because it's client language agnostic. |
|
Maybe putting it to the top model level ( |
|
@HyukjinKwon so my proposal is the following:
WDYT? |
|
Can one of the admins verify this patch? |
|
Merged to master. |
…hon Client ### What changes were proposed in this pull request? This PR implements the connection string for Spark Connect clients according to the documentation added in #38470. With this patch it becomes possible to connect to a Spark Connect endpoint using ``` spark = SparkRemoteSession(user_id="martin", connection_string="sc://hostname/;use_ssl=true;token=abcd") spark.read.table("test").limit(10).toPandas() ``` The connection string is properly parsed and filtered. This allows to dynamically configure SSL and bearer token authentication. All remaining parameters are converted into GRPC Metadata pairs and submitted as part of the request. ### Why are the changes needed? User experience. ### Does this PR introduce _any_ user-facing change? No, experimental API. ### How was this patch tested? UT Closes #38485 from grundprinzip/SPARK-41001. Lead-authored-by: Martin Grund <[email protected]> Co-authored-by: Martin Grund <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…String ### What changes were proposed in this pull request? This patch adds documentation to describe how clients should implement handling connecting to the Spark Connect endpoint. GRPC as a protocol is well documented and has many options. However, this does not make it easy for users to reason about how to correctly configure GRPC to make it work for their use cases. To overcome the issue, this document defines a client connection string that needs to be parsed by the different language clients and can be used to properly configure the GRPC client. ### Why are the changes needed? Documentation and design specification for clients implementing the Spark Connect protocol. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Doc only. Closes apache#38470 from grundprinzip/client-connection. Lead-authored-by: Martin Grund <[email protected]> Co-authored-by: Martin Grund <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…hon Client ### What changes were proposed in this pull request? This PR implements the connection string for Spark Connect clients according to the documentation added in apache#38470. With this patch it becomes possible to connect to a Spark Connect endpoint using ``` spark = SparkRemoteSession(user_id="martin", connection_string="sc://hostname/;use_ssl=true;token=abcd") spark.read.table("test").limit(10).toPandas() ``` The connection string is properly parsed and filtered. This allows to dynamically configure SSL and bearer token authentication. All remaining parameters are converted into GRPC Metadata pairs and submitted as part of the request. ### Why are the changes needed? User experience. ### Does this PR introduce _any_ user-facing change? No, experimental API. ### How was this patch tested? UT Closes apache#38485 from grundprinzip/SPARK-41001. Lead-authored-by: Martin Grund <[email protected]> Co-authored-by: Martin Grund <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This patch adds documentation to describe how clients should implement handling
connecting to the Spark Connect endpoint. GRPC as a protocol is well documented and has
many options. However, this does not make it easy for users to reason about how to correctly
configure GRPC to make it work for their use cases.
To overcome the issue, this document defines a client connection string that needs to be parsed
by the different language clients and can be used to properly configure the GRPC client.
Why are the changes needed?
Documentation and design specification for clients implementing the Spark Connect protocol.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Doc only.