Skip to content

[SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client#38485

Closed
grundprinzip wants to merge 9 commits intoapache:masterfrom
grundprinzip:SPARK-41001
Closed

[SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client#38485
grundprinzip wants to merge 9 commits intoapache:masterfrom
grundprinzip:SPARK-41001

Conversation

@grundprinzip
Copy link
Contributor

@grundprinzip grundprinzip commented Nov 2, 2022

What changes were proposed in this pull request?

This PR implements the connection string for Spark Connect clients according to the documentation added in #38470.

With this patch it becomes possible to connect to a Spark Connect endpoint using

spark = SparkRemoteSession(user_id="martin", connection_string="sc://hostname/;use_ssl=true;token=abcd")
spark.read.table("test").limit(10).toPandas()

The connection string is properly parsed and filtered. This allows to dynamically configure SSL and bearer token authentication. All remaining parameters are converted into GRPC Metadata pairs and submitted as part of the request.

Why are the changes needed?

User experience.

Does this PR introduce any user-facing change?

No, experimental API.

How was this patch tested?

UT

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine from a cursory look. cc @zhengruifeng too

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

grundprinzip and others added 2 commits November 4, 2022 06:05
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
@grundprinzip
Copy link
Contributor Author

accepted suggestion and fixed a doc example with missing quote

@HyukjinKwon
Copy link
Member

Merged to master.

self.assertEqual(len(expectResult), len(actualResult))


class ChannelBuilderTests(ReusedPySparkTestCase):
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be skipped by should_test_connect like SparkConnectSQLTestCase in this file.

@unittest.skipIf(not should_test_connect, connect_requirement_message)
class SparkConnectSQLTestCase(ReusedPySparkTestCase):

I made a PR for that.

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…hon Client

### What changes were proposed in this pull request?

This PR implements the connection string for Spark Connect clients according to the documentation added in apache#38470.

With this patch it becomes possible to connect to a Spark Connect endpoint using

```
spark = SparkRemoteSession(user_id="martin", connection_string="sc://hostname/;use_ssl=true;token=abcd")
spark.read.table("test").limit(10).toPandas()
```

The connection string is properly parsed and filtered. This allows to dynamically configure SSL and bearer token authentication. All remaining parameters are converted into GRPC Metadata pairs and submitted as part of the request.

### Why are the changes needed?
User experience.

### Does this PR introduce _any_ user-facing change?
No, experimental API.

### How was this patch tested?
UT

Closes apache#38485 from grundprinzip/SPARK-41001.

Lead-authored-by: Martin Grund <martin.grund@databricks.com>
Co-authored-by: Martin Grund <grundprinzip@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants