-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect #40228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, shall we add it into Python API too if we should?
I remember this wasn't added for some concerns from @hvanhovell (maybe I am remembering this wrongly?). This is important API for ML to use in any event. cc @WeichenXu123 FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR I have added the python version. Can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems @hvanhovell and @cloud-fan had some concerns in sameSemantics and semanticHash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hash argument still stands. However I think this is also a matter of setting the right expectations here, and to update the docs accordingly.
@WeichenXu123 it would be good to understand your usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just add an assert instead of if with plain Exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add @DeveloperApi for this one?
|
@amaliujia if you have time, let's also get this one over the line. |
|
@hvanhovell I just addressed actionable comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this comment? The comparison won't be fast. Maybe add a note here to explain that this executes a RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. gRPC might still fast, but just not as fast as before. I removed very fast and explain that it will execute a RPC call in comment
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one small comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line 1494, in test_same_semantics_error
with QuietTest(self.sc):
AttributeError: 'DataFrameParityTests' object has no attribute 'sc'
I think you can not enable this test since we don't have SparkContext, but you can remove the TODO and update it to
@unittest.skip("Spark Connect does not SparkContext but the tests depend on them.")
def test_same_semantics_error(self):
super().test_same_semantics_error()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
|
@amaliujia can you update the PR? |
|
@hvanhovell waiting for CI |
### What changes were proposed in this pull request? Support SameSemantics in Spark Connect. ### Why are the changes needed? API coverage ### Does this PR introduce _any_ user-facing change? SameSemantics API calls from users returns result now than throwing an exception. ### How was this patch tested? UT Closes #40228 from amaliujia/sameSemantics. Authored-by: Rui Wang <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit 6053483) Signed-off-by: Ruifeng Zheng <[email protected]>
|
merged into master/branch-3.4 |
### What changes were proposed in this pull request? Support SameSemantics in Spark Connect. ### Why are the changes needed? API coverage ### Does this PR introduce _any_ user-facing change? SameSemantics API calls from users returns result now than throwing an exception. ### How was this patch tested? UT Closes apache#40228 from amaliujia/sameSemantics. Authored-by: Rui Wang <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit 6053483) Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
Support SameSemantics in Spark Connect.
Why are the changes needed?
API coverage
Does this PR introduce any user-facing change?
SameSemantics API calls from users returns result now than throwing an exception.
How was this patch tested?
UT