[SPARK-41874][CONNECT][PYTHON] Implement `DataFrame.sameSemantics` #39429

techaddict · 2023-01-06T10:03:59Z

What changes were proposed in this pull request?

Implement DataFrame.sameSemantics

Why are the changes needed?

api coverage

Does this PR introduce any user-facing change?

yes

How was this patch tested?

new Unit tests

techaddict · 2023-01-06T16:35:07Z

cc: @HyukjinKwon @zhengruifeng

zhengruifeng · 2023-01-07T01:33:34Z

@techaddict thank you for working on it.

we had some discussion on sameSemantics and semanticHash in #38742 (comment)

I think this one and #39427 are controversial, and the two are developer APIs (

spark/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines 3941 to 3967 in 470beda

    
             /** 
        
              * Returns `true` when the logical query plans inside both [[Dataset]]s are equal and 
        
              * therefore return same results. 
        
              * 
        
              * @note The equality comparison here is simplified by tolerating the cosmetic differences 
        
              *       such as attribute names. 
        
              * @note This API can compare both [[Dataset]]s very fast but can still return `false` on 
        
              *       the [[Dataset]] that return the same results, for instance, from different plans. Such 
        
              *       false negative semantic can be useful when caching as an example. 
        
              * @since 3.1.0 
        
              */ 
        
             @DeveloperApi 
        
             def sameSemantics(other: Dataset[T]): Boolean = { 
        
               queryExecution.analyzed.sameResult(other.queryExecution.analyzed) 
        
             } 
        
             /** 
        
              * Returns a `hashCode` of the logical query plan against this [[Dataset]]. 
        
              * 
        
              * @note Unlike the standard `hashCode`, the hash is calculated against the query plan 
        
              *       simplified by tolerating the cosmetic differences such as attribute names. 
        
              * @since 3.1.0 
        
              */ 
        
             @DeveloperApi 
        
             def semanticHash(): Int = { 
        
               queryExecution.analyzed.semanticHash() 
        
             }

).

I think we may not add them.

[SPARK-41874][CONNECT][PYTHON] Implement DataFrame.sameSemantics

8f2d8ba

github-actions bot added CONNECT CORE PYTHON SQL labels Jan 6, 2023

techaddict added 3 commits January 6, 2023 02:39

formatting

ac2e98a

Update relations_pb2.pyi

95affef

Merge branch 'master' into SPARK-41874

e8c90a0

HyukjinKwon approved these changes Jan 7, 2023

View reviewed changes

zhengruifeng mentioned this pull request Jan 7, 2023

[SPARK-41922][CONNECT][PYTHON] Implement DataFrame.semanticHash #39427

Closed

techaddict closed this Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-41874][CONNECT][PYTHON] Implement `DataFrame.sameSemantics` #39429

[SPARK-41874][CONNECT][PYTHON] Implement `DataFrame.sameSemantics` #39429

Uh oh!

techaddict commented Jan 6, 2023

Uh oh!

techaddict commented Jan 6, 2023

Uh oh!

zhengruifeng commented Jan 7, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-41874][CONNECT][PYTHON] Implement DataFrame.sameSemantics #39429

[SPARK-41874][CONNECT][PYTHON] Implement DataFrame.sameSemantics #39429

Uh oh!

Conversation

techaddict commented Jan 6, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

techaddict commented Jan 6, 2023

Uh oh!

zhengruifeng commented Jan 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-41874][CONNECT][PYTHON] Implement `DataFrame.sameSemantics` #39429

[SPARK-41874][CONNECT][PYTHON] Implement `DataFrame.sameSemantics` #39429

zhengruifeng commented Jan 7, 2023 •

edited

Loading