-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Include 'combinatorial' argument to t-test functions #9860
Comments
that changes the function signature, which breaks backwards compatibility - so that's a no go. the way to do it would be
It would be a bit weird to return a p-value from this function, and then tell the user that they need to modify it (or, not use 0.05/0.95). So either we support it properly, with the return values needed to let the user easily do the right thing without increasing type I errors, or we just point them to |
Hi @rgommers. yes, it is ugly because you effectively halve the desired required inputs and accommodate for that by passing a new argument set to true. Furthermore the type of ‘a’ can’t then be a single array if you have combinatorial set to true, which further confuses the arguments to the function. I couldn’t think of a better way to do it, however, save for defining a new method. I’ll have a look elsewhere in the library and see if there are potentially some indicators of how better to potentially handle this use-case (of mutually antagonistic argument sets) Wrt ANOVA, yes, that would be the proper way to test if there are any differences between the groups but it doesn’t tell you what those differences are which would require further investigation (an anova would be the first step, potentially).
Do we need to modify p values (or return other modified p values) for the multiple comparisons? If you pass ndarrays as a & b, the function doesn’t aim to alter the p values in that case does it? I wouldn’t assume it would be the job of the function to do this, as we are aware that we need to apply a correction (usually bonferonni) after running these multiple comparisons. In short, do you see a benefit to incorporating this methodology into scipy (or publishing it elsewhere) or should I just keep it internally? |
The combination of the keyword |
Agreed! |
Proposal
Add a new argument -
combinatorial
- in order for all combinations of the input ndarray to be compared to one another:stats.ttest_ind(ndarray, combinatorial=True,...)
If the ndarray contains 5 vectors (
np.array([array1, array2...array5)]
) then the function would calculate 1vs2, 1vs3, 1vs4...2vs3, 2vs4,...4vs5.Current Limitations
Currently the ttest_ind function requires that you pass in two array like variables, a & b, and compares one vector to the next:
or as an ndarray
Due to the structure of the arguments it means that only 1 comparison can be made: Thus requiring some combination, potentially using the itertools library to compare multiple groups (i.e. 1 vs 2, 2 vs 3 & 1 vs 3):
or this way:
However, it's not immediately obvious how to construct the input variables when doing combinations and suppose the dimensions become very large: It would require very large input ndarrays to be constucted, with large amounts of duplicated data. It therefore seems sensible that the arguments would support a general combinatorial case.
Internally I wrote a series of functions for the different t-tests that do support this input using matrix broadcasting and thus removing the need for python looping external to the function or constucting large input ndarrays:
Internal function:
Benchmark
This demonstrates that the new function is around 10 times faster when comparing 5 groups with one another when comparing to the python looping way.
Summary
What are the implications for adding a feature such as this? Statistically running multiple tests would affect the p-value performance as it would permit p-hacking/mining - however I'd argue that that decision rests with the statistician, not the statistical library.
I'd like to know people's thoughts on a feature such as - or similar to - this.
The text was updated successfully, but these errors were encountered: