Skip to content

Conversation

@josepablocam
Copy link

The current patch implements a 2-sample, 2-sided Kolmogorov Smirnov test. Similarly to the 1-sample implementation, we seek to reduce the shuffles necessary for computation. The user can provide 2 RDD[Double] and the Statistics.ksTest function allows them to test the null hypothesis that both samples came from the same probability distribution.

This patch includes the 1-sample test (so that reviewers can see the broader context of the change), however, that portion (and relevant tests) are being reviewed at https://issues.apache.org/jira/browse/SPARK-8598.

@srowen
Copy link
Member

srowen commented Jul 7, 2015

So this isn't to be merged in its current form? Put [WIP] in the title. This should probably just be reviewed later if/when the other PR is merged.

@josepablocam josepablocam changed the title [SPARK-8674] [MLlib] Implementation of a 2 sample Kolmogorov Smirnov Test [SPARK-8674] [MLlib] [WIP] Implementation of a 2 sample Kolmogorov Smirnov Test Jul 7, 2015
@sryza
Copy link
Contributor

sryza commented Jul 13, 2015

Hey @josepablocam can you rebase this on current master?

@josepablocam
Copy link
Author

@sryza yes, will do.

@josepablocam josepablocam changed the title [SPARK-8674] [MLlib] [WIP] Implementation of a 2 sample Kolmogorov Smirnov Test [SPARK-8674] [MLlib] Implementation of a 2 sample Kolmogorov Smirnov Test Jul 21, 2015
@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

@sryza Do you want to make another pass? Please sign off if you think this is ready:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this unionedData

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be clearer to write true and false instead of isSample1 and !isSample1? I don't have a strong opinion.
Can the map functions be like .map((_, isSample1)) for tidiness or does that syntax not work here?
Finally I wonder if .union is clearer than ++ here? I don't have a strong opinion either, just am somehow used to method invocations on RDDs and ++-like syntax for Scala collections.

@sryza
Copy link
Contributor

sryza commented Jul 31, 2015

@mengxr @josepablocam oops thought it was still a WIP for some reason. Just took a pass. It looks mostly done - I just had a bunch of nits and a test request.

@josepablocam
Copy link
Author

Mmm. I seem to be having some issues building and testing on my laptop. It keeps failing when building Catalyst. I'll try this first thing in the morning at work and push if it passes tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sortByKey? I think the mapPartitions doesn't need braces, just parens, but that's tiny.

…to account for aliasing of commons' KS test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd make the above one "Sample follows the theoretical distribution" or make the bottom one "Both samples follow same distribution".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Modified the first. Thanks

…ical functions are package private and can be tested more directly
@sryza
Copy link
Contributor

sryza commented Aug 4, 2015

jenkins, test this please

@sryza
Copy link
Contributor

sryza commented Aug 4, 2015

This LGTM pending jenkins

@mengxr
Copy link
Contributor

mengxr commented Aug 5, 2015

test this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39872 has finished for PR 7075 at commit 16ba96e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@josepablocam
Copy link
Author

ugh, did not reword the tests in pyspark after we slightly cleaned up the grammar in the 2 sample test. I will make the ks 2 sample test hypothesis statement match the grammar in the first. Sorry about this!

… unit test failure by reversing prior grammar change
@sryza
Copy link
Contributor

sryza commented Aug 17, 2015

@mengxr is it too late to get this in to 1.5?

@josepablocam are you able to resolve merge conflicts?

@josepablocam
Copy link
Author

@sryza fixed merge conflicts

@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58715 has finished for PR 7075 at commit feacda0.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2016

Test build #69329 has finished for PR 7075 at commit feacda0.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@josepablocam, it looks the conflicts were not resolved cleanly. Would you resolve them?

@josepablocam
Copy link
Author

josepablocam commented Jun 19, 2017 via email

@gatorsmile
Copy link
Member

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

@asfgit asfgit closed this in b32bd00 Jun 27, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
## What changes were proposed in this pull request?

This PR proposes to close stale PRs, mostly the same instances with apache#18017

I believe the author in apache#14807 removed his account.

Closes apache#7075
Closes apache#8927
Closes apache#9202
Closes apache#9366
Closes apache#10861
Closes apache#11420
Closes apache#12356
Closes apache#13028
Closes apache#13506
Closes apache#14191
Closes apache#14198
Closes apache#14330
Closes apache#14807
Closes apache#15839
Closes apache#16225
Closes apache#16685
Closes apache#16692
Closes apache#16995
Closes apache#17181
Closes apache#17211
Closes apache#17235
Closes apache#17237
Closes apache#17248
Closes apache#17341
Closes apache#17708
Closes apache#17716
Closes apache#17721
Closes apache#17937

Added:
Closes apache#14739
Closes apache#17139
Closes apache#17445
Closes apache#18042
Closes apache#18359

Added:
Closes apache#16450
Closes apache#16525
Closes apache#17738

Added:
Closes apache#16458
Closes apache#16508
Closes apache#17714

Added:
Closes apache#17830
Closes apache#14742

## How was this patch tested?

N/A

Author: hyukjinkwon <[email protected]>

Closes apache#18417 from HyukjinKwon/close-stale-pr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants