Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Make df.stat.{cov, corr} consistent with sql functions

Why are the changes needed?

it is weird to have two implemetations in SQL

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing UTs

@github-actions github-actions bot added the SQL label Oct 27, 2022
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but to confirm, are they exactly same? or are there corcer case behaciour differences? If there are, we should note it in the migration guide.

@zhengruifeng
Copy link
Contributor Author

seems different null handling, let me update this PR

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Oct 28, 2022

@HyukjinKwon as far as I know, they are the same now.

the original tests didn't cover null handling and empty dataset, I add a new UT to make sure no behavior change.

@zhengruifeng zhengruifeng changed the title [SPARK-40933][SQL] Make df.stat.{cov, corr} consistent with sql functions [SPARK-40933][SQL] Reimplement df.stat.{cov, corr} by built-in sql functions Oct 28, 2022
@zhengruifeng zhengruifeng changed the title [SPARK-40933][SQL] Reimplement df.stat.{cov, corr} by built-in sql functions [SPARK-40933][SQL] Reimplement df.stat.{cov, corr} with built-in sql functions Oct 28, 2022
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the sql_stat_corr_cov branch October 31, 2022 02:22
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…functions

### What changes were proposed in this pull request?
Make df.stat.{cov, corr} consistent with sql functions

### Why are the changes needed?
it is weird to have two implemetations in SQL

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing UTs

Closes apache#38411 from zhengruifeng/sql_stat_corr_cov.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants