-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hypothesis tests for testing distributions instead of matching moments of the distribution #31
Comments
References for statistical tests of significance:
Existing libraries that already perform A/B testing:
|
I have something basic created here. @agarie @MohawkJohn @clbustos can you please have a look at this and let me know if we can try something similar? (depending on how well this is able to predict things) |
This would be really, really good, but also a lot of work. If you do have the time to work on implementing this, please, go for it! I looked into your gist, and it seems OK—we'll probably have to make some minor adjustments to style, but that shouldn't be a problem after the problem is solved. |
Thanks. I'm currently finalizing the binomial rng since the first principles one is too slow to be practical beyond a small sample size. I'll start on this once I finish binomial. Would this be better implemented as a separate module under lib (maybe other tests can be added here later), or just added to spec_helper.rb? |
Add it to |
Just a random thought: This should be included on statsample later, because is a statistical test after all. Kolmogorov-Smirnof and homogeneity chi-square test are already there. |
Thanks for pointing me to that, I found that there is already an implementation already in place in statsample/test/chisquare.rb I think we can replace the mean tests if the goodness of fit tests gives better performance for a similar sample size. How we can measure performance for making the replacement call is:
I see some problems for using statsample to test Let me know what you guys think. Edit: Updated title to reflect the idea of using statistical hypothesis tests and not just the chiSquared test |
Pearson's chi squared test is a more reliable method of ascertaining whether a sequence of numbers belongs to a distribution or follows a patterns. It is easy to fool the test for correct mean and variance with dummy values inserted to adjust it to fit any distribution.
However we should not ignore that mean and variance must be reproduced correctly, the suggestion here is that Pearson's chi squared test be used to refactor test cases into the following structure:
pearson_chi_squared(candidate: Distribution::Uniform.rng(0.1, 1), target: :uniform, samples: 1000)
and returns the significance level of the test as a double.metadata_for(candidate: Distribution::Normal.rng(0.1, 1), target: :normal, confidence: 0.99, samples: 100)
returns{mean: 0.1, variance: 0.96, skewness: 0.15 ... }
i
is momenti
of the sequenceLet me know what you think about this. Right now I feel a lot of test cases are repeated. This issue would of-course require that all the methods in README.md are already implemented so as to compare stuff.
The text was updated successfully, but these errors were encountered: