Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for S4 DataFrame for left_join #80

Closed
lambdamoses opened this issue Aug 7, 2023 · 17 comments · Fixed by #98 or #99
Closed

Method for S4 DataFrame for left_join #80

lambdamoses opened this issue Aug 7, 2023 · 17 comments · Fixed by #98 or #99
Assignees

Comments

@lambdamoses
Copy link
Contributor

This way we can use colData(sce) in left join into another SCE object without having to convert to S3 tibble.

@stemangiola
Copy link
Owner

Do you have in mind (?)

single_cell_experiment |> *_join(DataFrame)

It sounds good!

@lambdamoses
Copy link
Contributor Author

Yes, that's what I mean. So we can do something like sce |> left_join(colData(sce2)).

@stemangiola
Copy link
Owner

Sure,

we would need all join flavours (full, right, left, anti), as tidySingleCellExperiment has API for all of them. But it should be pretty straightforward.

@lambdamoses
Copy link
Contributor Author

Do you want me to do a pull request? I think it should be pretty easy to implement, though I won't be able to do it soon because I'm busy with many other things recently.

@stemangiola
Copy link
Owner

Do you want me to do a pull request? I think it should be pretty easy to implement, though I won't be able to do it soon because I'm busy with many other things recently.

That would be great.

@lambdamoses
Copy link
Contributor Author

OK, super last minute for Bioc 3.18. Question: It seems that the code in those different flavors of *_join is all the same except for the dplyr function called. Do you want me to refactor using a function factory, so if you want to change the *_join functions, then you only need to change 1 place instead of 4 places?

@stemangiola
Copy link
Owner

stemangiola commented Oct 19, 2023

OK, super last minute for Bioc 3.18. Question: It seems that the code in those different flavors of *_join is all the same except for the dplyr function called. Do you want me to refactor using a function factory, so if you want to change the *_join functions, then you only need to change 1 place instead of 4 places?

Would not https://github.com/lambdamoses/tidySingleCellExperiment/blob/bcd3f5529948e9cff2f317bbf001558db9986bae/R/dplyr_methods.R#L329

take all arguments of join*?

@lambdamoses
Copy link
Contributor Author

lambdamoses commented Oct 19, 2023

Yes, it takes all arguments of *_join. The outer function controls the small parts that differ among the 4 join flavors, while the inner function implements the large part that is the same for all flavors of join. The outer function .join_factory returns the inner function, which takes all arguments of join.

Since you want S4 DataFrame for y for tidySE, do you want the refactored version?

@stemangiola
Copy link
Owner

the thing that confuses me is that left_join.SingleCellExperiment should take all the argument of dplyr left_join. but it does not in your refactored version

@lambdamoses
Copy link
Contributor Author

What do you mean? The original code says, dplyr::left_join(y, by=by, copy=copy, suffix=suffix, ...), and the arguments that show up here and ... are preserved in .join_factory's inner function. The other arguments are passed through ....

@stemangiola
Copy link
Owner

OK, I have never see this kind of declaration

left_join.SingleCellExperiment <- my_function()

but only

left_join.SingleCellExperiment <- my_function

how can you do

debug(...)

for join_functions now?

can you dsinply do

debug(left_join)

?

@lambdamoses
Copy link
Contributor Author

The RStudio debugger will still work. Keep on pressing "step in", the debugger will bring you to the inner function of the function factory.

@stemangiola
Copy link
Owner

Yes, it takes all arguments of *_join. The outer function controls the small parts that differ among the 4 join flavors, while the inner function implements the large part that is the same for all flavors of join. The outer function .join_factory returns the inner function, which takes all arguments of join.

Since you want S4 DataFrame for y for tidySE, do you want the refactored version?

OK feel free to push a PR with the refactoring, I am trusting you on this :) as I don't fully understand this high-level factoring of function makers.

@lambdamoses
Copy link
Contributor Author

I am trusting you on this :) as I don't fully understand this high-level factoring of function makers.

I ran all the unit tests and wrote new unit tests for allowing S4. This is a type of functional programming. Read more about it here: https://adv-r.hadley.nz/function-factories.html

@stemangiola stemangiola linked a pull request Oct 22, 2023 that will close this issue
@mikelove
Copy link

This is great. We should also support something for mcols() of GRanges as well.

@william-hutchison
Copy link
Collaborator

william-hutchison commented Jan 4, 2024

Hi @lambdamoses, thanks for your great contribution! Please add your details to this authorship list if you would like to be included in our upcoming publication:

https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing

@llrs
Copy link

llrs commented Apr 22, 2024

I came here thanks to @mikelove through Bioconductor's slack.

I want to filter my SCE according to scutter::addPerFeatureQC, I found it was not easy to do that:
sce |> scutter::addPerFeatureQC() |> rowData() |> as.data.frame() |> filter(...).
Would some other generics be of interest? I think in something like a package (tidyS4Vectors?) increasing the methods as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants