Skip to content

Conversation

@amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

Support SameSemantics in Spark Connect.

Why are the changes needed?

API coverage

Does this PR introduce any user-facing change?

SameSemantics API calls from users returns result now than throwing an exception.

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

@hvanhovell

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shall we add it into Python API too if we should?

I remember this wasn't added for some concerns from @hvanhovell (maybe I am remembering this wrongly?). This is important API for ML to use in any event. cc @WeichenXu123 FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR I have added the python version. Can you take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems @hvanhovell and @cloud-fan had some concerns in sameSemantics and semanticHash

#38742 (comment)

#38742 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hash argument still stands. However I think this is also a matter of setting the right expectations here, and to update the docs accordingly.

@WeichenXu123 it would be good to understand your usecase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add an assert instead of if with plain Exception?

@HyukjinKwon HyukjinKwon changed the title [SPARK-41874][CONNECT] Support SameSemantics in Spark Connect [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect Mar 1, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add @DeveloperApi for this one?

@hvanhovell
Copy link
Contributor

@amaliujia if you have time, let's also get this one over the line.

@amaliujia
Copy link
Contributor Author

@hvanhovell I just addressed actionable comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment? The comparison won't be fast. Maybe add a note here to explain that this executes a RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. gRPC might still fast, but just not as fast as before. I removed very fast and explain that it will execute a RPC call in comment

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one small comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line 1494, in test_same_semantics_error
    with QuietTest(self.sc):
AttributeError: 'DataFrameParityTests' object has no attribute 'sc'

I think you can not enable this test since we don't have SparkContext, but you can remove the TODO and update it to

    @unittest.skip("Spark Connect does not SparkContext but the tests depend on them.")
    def test_same_semantics_error(self):
        super().test_same_semantics_error()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks!

@hvanhovell
Copy link
Contributor

@amaliujia can you update the PR?

@amaliujia
Copy link
Contributor Author

@hvanhovell waiting for CI

zhengruifeng pushed a commit that referenced this pull request Mar 6, 2023
### What changes were proposed in this pull request?

Support SameSemantics in Spark Connect.

### Why are the changes needed?

API coverage

### Does this PR introduce _any_ user-facing change?

SameSemantics API calls from users returns result now than throwing an exception.

### How was this patch tested?

UT

Closes #40228 from amaliujia/sameSemantics.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 6053483)
Signed-off-by: Ruifeng Zheng <[email protected]>
@zhengruifeng
Copy link
Contributor

merged into master/branch-3.4

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?

Support SameSemantics in Spark Connect.

### Why are the changes needed?

API coverage

### Does this PR introduce _any_ user-facing change?

SameSemantics API calls from users returns result now than throwing an exception.

### How was this patch tested?

UT

Closes apache#40228 from amaliujia/sameSemantics.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 6053483)
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants