-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pairwise comparisons #352
base: master
Are you sure you want to change the base?
Pairwise comparisons #352
Conversation
RespondentIdeology = c("Very Conservative") | ||
) | ||
) | ||
expected_chisq = structure(5.74120651376478, .Names = "X-squared") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes in this file should move to test-cube-pairwise, but I wanted to make the changes more evident in the pr.
Codecov Report
@@ Coverage Diff @@
## master #352 +/- ##
==========================================
- Coverage 89.97% 89.96% -0.01%
==========================================
Files 115 116 +1
Lines 7240 7277 +37
==========================================
+ Hits 6514 6547 +33
- Misses 726 730 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #352 +/- ##
==========================================
- Coverage 90.47% 90.42% -0.06%
==========================================
Files 120 116 -4
Lines 7362 7362
==========================================
- Hits 6661 6657 -4
- Misses 701 705 +4
Continue to review full report at Codecov.
|
574b1ff
to
29b65c6
Compare
.Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d"))) | ||
referencePvals <- structure(rep(1, 16), | ||
.Dim = c(4L, 4L), | ||
.Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be re-written as
cubify(0, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
and
cubify(1, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
respectively (and most or all other structure(...)
calls could be as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should they? is cubify
preferred for developer readability? As only an occasional contributor I wasn’t clear on what cubify
did and structure
was clearer to me ”this is the reference object”. Can change if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find cubify to be a bit more readable, and it saves arguments — it's not necessary of course but those structures stood out to me and I had to copy/paste them and run them to confirm they were doing what they were doing.
It's not relevant for these two, but for the ones that are not a single value it also would be nice if the line breaks matched the rows breaks to make it easier to read. I found myself having to mentally find the diagonal and shift forward/back from there to see where I was in the matrix.
expect_equal( | ||
compareCols( | ||
gender_x_ideology, | ||
baseline = "Very liberal", | ||
x = "Very Conservative" | ||
), | ||
expected_zScores | ||
expected_chisq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading this right that we're changing the behavior of compareCols()
from returning z-scores for each cell (like zScores()
does) to returning the return value from chisq.test
? Is there any reason not to include stdres
in the allowed value
parameters so that it can return what we have been already returning?
My suspicion from talking to a few of the offices about this is that the p-value derived from those z-scores are what people are thinking they want when they compare one column against another (and the same in the pair-wise compare all columns case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we are transitioning away from z-scores to just providing the test statistic we should match that behavior elsewhere. I'm not certain we actually want to make that transition, however.
#' | ||
#' @examples | ||
#' \dontrun{ | ||
#' some_cube <- crunch_example_data(cat_by_cat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' some_cube <- crunch_example_data(cat_by_cat) | |
#' some_cube <- crtabs(~ educ + gender, ds) |
#' Use the alternative Wishart method of forming the matrix of column- or row-wise | ||
#' comparison Chi-squared test statistics for a categorical-by-categorical | ||
#' contingency table. | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add:
The null hypothesis is that all of the rows (or columns) are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).
#' | ||
#' Generate a matrix of pairwise comparisons of rows or columns, each against | ||
#' the others. | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add:
The null hypothesis is that for each pair of rows (or columns) those two specific rows (or columns) in the pair are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).
29b65c6
to
a4adb4e
Compare
a4adb4e
to
ea7eb71
Compare
ea7eb71
to
10a6d89
Compare
No description provided.