Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revise unit tests tidySingleCellExperiment #79

Merged
merged 7 commits into from
Aug 15, 2023

Conversation

HelenaLC
Copy link
Contributor

@HelenaLC HelenaLC commented Aug 5, 2023

  • revised tests (including plotting) bringing us to ~90% coverage
  • tests are a little more complex/data-driven now (e.g., no hard coded numbers, in case we change the example data)
  • also fixed a potential bug in rename() (renaming special columns, e.g., .cell, previously gave an error that the column doesn't exists, but should work as intended now)
  • also updated the example data (pbmc_small) via BiocGenerics::updateObject()) as it was outdated, causing some tests to fail (due to differences in internal metadata)

@stemangiola stemangiola self-requested a review August 6, 2023 07:22
Copy link
Owner

@stemangiola stemangiola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe?

I am trying to get rid of dplyr pipe wherever I can.

@@ -275,12 +274,13 @@ mutate.SingleCellExperiment <- function(.data, ...) {
rename.SingleCellExperiment <- function(.data, ...) {

# Check that we are not modifying a key column
cols <- tidyselect::eval_select(expr(c(...)), colData(.data) %>% as.data.frame())

df <- as_tibble(.data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be, I believe more inefficient than getting colData directly, as it integrates also PC, UMAP etc..

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe could you help me understand the nature of this 3-line change? So I can have a bit more context.

Copy link
Contributor Author

@HelenaLC HelenaLC Aug 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I came across this while writing unit tests. Basically, if you try with the current version something like rename(sce, new_name=.cell), it will pass without an informative warning, but then fails because .cell is not an actual column.

I think it works for other functions, but not here, because the syntax is rename(..., new=old) (unlike most other verbs, e.g., mutate(..., new=value), we want to block the OLD columns here, I think?), so cols from using eval_select() will correspond to the left-hand side names, not the ones being changed (right-hand side).

What made it clear to me was adding a print(cols) in line 279 and see what's happening vs. what happens with using eval_rename() instead (which returns indices)... Of course, would be great to find a way without using as_tibble(), but I couldn't come up with one :/ Though if we want to limit cost here, we could just do, e.g., as_tibble(.data[, 1]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure if maybe you head in mind doing both, i.e., not letting someone rename a column to .cell, but also not trying to rename .cell? I that case, we'd need a combination of both eval_select() and something else like the above, but more elegantly perhaps.

Copy link
Owner

@stemangiola stemangiola Aug 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

So .cell is an immutable column.

A solution can be

  • colData() |> as_tibble(rownames = EXISTING_INTERNAL_FUNCTION_THAT_ENCODES_CELL_COLNAME)
  • rename
  • drop .cell column so it's untouched
  • warning ("tidySingleCellExperiment says: ...") If .cell is in the query (there could be multiple columns in the query)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure if maybe you head in mind doing both, i.e., not letting someone rename a column to .cell, but also not trying to rename .cell?

Pretty much this.

@stemangiola
Copy link
Owner

  • revised tests (including plotting) bringing us to ~90% coverage
  • tests are a little more complex/data-driven now (e.g., no hard coded numbers, in case we change the example data)
  • also fixed a potential bug in rename() (renaming special columns, e.g., .cell, previously gave an error that the column doesn't exists, but should work as intended now)
  • also updated the example data (pbmc_small) via BiocGenerics::updateObject()) as it was outdated, causing some tests to fail (due to differences in internal metadata)

Thanks @HelenaLC ! That's great. I left 3 comments with some little changes or explanations.

Love the PR.

@stemangiola stemangiola added bug Something isn't working enhancement New feature or request labels Aug 6, 2023
@HelenaLC
Copy link
Contributor Author

HelenaLC commented Aug 6, 2023

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe?

I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

@stemangiola
Copy link
Owner

stemangiola commented Aug 6, 2023

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe?
I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

Base pipes after R evaluation they disappear. It is just a visual-coding aid. So makes no difference for R.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

Yes thta's a more complex topic. We could discuss about that. It is good to respect standards, but we are practically contributing to changing them. And it is becoming evident to many. When I think about standards, I try to imagine what they will be in 5-10 years. Things are moving fast :)

Personal opinion: I think pipes make the code more readable in virtually all cases.

@HelenaLC
Copy link
Contributor Author

HelenaLC commented Aug 6, 2023

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe?
I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

Base pipes after R evaluation they disappear. It is just a visual-coding aid. So makes no difference for R.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

Yes thta's a more complex topic. We could discuss about that. It is good to respect standards, but we are practically contributing to changing them. And it is becoming evident to many. When I think about standards, I try to imagine what they will be in 5-10 years. Things are moving fast :)

Personal opinion: I think pipes make the code more readable in virtually all cases.

Absolutely. I did not mean to speak against pipes in general (illusional or not). Currently, I am trying my best to keep them for workflow-like operations, where they do make things easier to write and read. And only not use them when it really is a lot easier to follow (in my opinion), e.g., I struggled more to understand things like ... |> colnames() |> duplicated() |> which() |> length() |> gt(0) or ... |> duplicated() |> any() |> '!'() as opposed to !any(duplicated(colnames(...))), which literally reads "not any duplicated colnames". All that to say: Yes, I agree that they are easier to read in many cases, but perhaps not all (and also more cumbersome to debug at times...).

Independent of personal opinions / style-preferences, my case against using dplyr/tidyr inside packages (unless it does make things a LOT easier) is mainly because they are quite volatile at times. I have had to make quite frequent packages updates because of some syntax changes, deprecations etc. throwing warnings. So find it best to avoid them for simple things when possible without much magic.

Aaanyways, hoping to finish a first iteration first (towards a clean-ish BiocCheck) and we can see from there and change things that are not appreciated. Like I said, I am not against pipes (use them a lot!) and am keeping them in the vast majoirty of cases (like ~90%). Mainly trying to assure consistent use of <-/=, spacing, line-width, not using @ for accession, etc.

*One example re the above, when() has been deprecated in favor of base R if (...) ... else(...) ... :/

@stemangiola
Copy link
Owner

is mainly because they are quite volatile at times. I have had to make quite frequent packages updates because of some syntax changes, deprecations etc

true

@stemangiola stemangiola linked an issue Aug 9, 2023 that may be closed by this pull request
@HelenaLC
Copy link
Contributor Author

Just wondering before I finalize the next one, what's happening with this PR? Is there something you'd like to be changed explicitly (and perhaps kept for another PR) before merging?

@stemangiola
Copy link
Owner

I see.

So .cell is an immutable column.

A solution can be

  • colData() |> as_tibble(rownames = EXISTING_INTERNAL_FUNCTION_THAT_ENCODES_CELL_COLNAME)
  • rename
  • drop .cell column so it's untouched
  • warning ("tidySingleCellExperiment says: ...") If .cell is in the query (there could be multiple columns in the query)

Hello @HelenaLC, is the above still pending? I think this is the last part missing, to be modified fro your PR. All the rest look fantastic.

@HelenaLC
Copy link
Contributor Author

Alright. I implemented a fix that I think does what it should, i.e., special columns (.cell, PCs etc.) cannot be renamed NOR be used as target names; it is a little more complicated than what was there before, but does the trick.

@stemangiola
Copy link
Owner

Alright. I implemented a fix that I think does what it should, i.e., special columns (.cell, PCs etc.) cannot be renamed NOR be used as target names; it is a little more complicated than what was there before, but does the trick.

Great, thanks for that! We were not getting the expected error, so I slightly changed the logic.

Also, in the future, we can check expect_error specifying the expected error message.

Thanks for this PR!

@stemangiola stemangiola merged commit d05b3fd into stemangiola:master Aug 15, 2023
@stemangiola stemangiola changed the title revise unit tests revise unit tests tidySingleCellExperiment Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve unit testing of tidySingleCellExperiment
3 participants