-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_split #97
group_split #97
Conversation
issue #35975561
Issue #35975561
Does the function works with an arbitrary set of variables? |
tests/testthat/test-dplyr_methods.R
Outdated
expect_equal(length(fd), length(unique(df$groups))) | ||
|
||
fd <- df |> | ||
group_split("vst.variable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: what is vst.variable
? I cannot find it in the object metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the rowData. I realise now that code wasn't doing what I thought it did and that it wouldn't have been helpful if it had!
It does now! |
I seem to be doing something that the SCE version of unite doesn't like. Maybe could avoiding adding a column in the first place.
I refactored the code in an attempt to fix the error. The function works when I run it myself and during unit tests but fails when I rebuild and run R CMD Check. It throws an error saying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does dplyr allows for arbitrary tidy select
functionalities? e.g. contains
, starts_with
, etc? If so, you might want to create a split data frame using select(...)
.
R/dplyr_methods.R
Outdated
filter(group_col == group_list[[i]], ) | ||
|
||
v[[i]] <- select(v[[i]], !group_col) | ||
v[[i]] <- .data[,group_list == groups[[i]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does group_col
get eliminated after the splitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified this to make it more obvious, but group_col
the object never gets added to the table itself. I have added the .keep option to drop the original columns, however.
Hello @B0ydT, any news about this PR? Let me know if you need help/more explanation. |
ping |
Thanks for your feedback. It was very clear and I addressed most of it. I'm not 100% sure where you want to use select, though. I'm still getting an R CMD Check error for the call to Edit: My example was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thanks.
The function works well for simple cases. But please notice that with dplyr you can do this
tibble(a=1:10) |> group_split(a>5)
tibble(a=1:10) |> group_split(a==5)
With SingleCellExperiment, I would like to be able to do
pbmc_small |> group_split(PC_1>0)
or
pbmc_small |> group_split(groups=="g1")
This is easy to achieve, preserving your variable query as tidy select
. Also, look for "special_column" in the package, and you will see how I adapt all queries to all columns displayed in the Tibble representation. Ideally, each function is completely general. For example
pbmc_small |> group_split(PC_1>0 & groups=="g1")
We are close!
I'm not sure it's exactly the method you had in mind, but I think I've fixed it. I totally forgot those logical statements in I saw a lot of your other methods make use of the original dplyr functions. I had given up on It does not add those "PC_1>0" type columns with logical values yet, but I should be able to add those shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing. Please add my tests above as unit tests, and then I think we might be done!
Have added the tests. I am trying to sidestep all of the name corrections so that the names of new columns are consistent with what you'd get from the The closest I've gotten is
I can use |
don't see the problem. this looks good to me pbmc_small |> group_split(PC_1>0 & groups == "g2") %>% .[[1]] |> select(groups)
tidySingleCellExperiment says: Key columns are missing. A data frame is returned for independent data analysis.
# A tibble: 75 × 1
groups
<chr>
1 g2
2 g1
3 g2
4 g2
5 g2
6 g1
7 g1
8 g1
9 g1
10 g1 If it behaves well enough for the vast majority of use cases, I would say let's go with this, and we can improve it in the future. It would be good to translate this to |
Congrats @B0ydT ! Let me know what you think about repurposing your PR. |
I've got a
group_split
implementation working for row and column data. I can work ongroup_by
if you're happy with it. If not, let me know where it needs work.I was a bit stuck overthinking some decisions but just decided to go for it, so I'm very happy to change things up. For example, I wasn't sure if it would be preferable to specify grouping by row/column or autodetecting like I ended up doing.
Issue #71