Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comply to Bioconductor review #8

Closed
19 of 22 tasks
stemangiola opened this issue Oct 5, 2020 · 14 comments · Fixed by #9 or #10
Closed
19 of 22 tasks

comply to Bioconductor review #8

stemangiola opened this issue Oct 5, 2020 · 14 comments · Fixed by #9 or #10

Comments

@stemangiola
Copy link
Owner

stemangiola commented Oct 5, 2020

DESCRIPTION

  • I think the package should be re-named for maximum clarity as TidySingleCellExperiment -- the appreviation SCE could mean anything. The capitalization follows Bioconductor convention.

We are well in-line as I don't like acronyms either. As note, sce is the common name for SingleCellExperiment object e.g. in this official source, is really commonly accepted by the community.

In order to respect both Bioconductor tradition and tidyverse standards (afterall this is a equal-rights marriage) the name tidySingleCellExperiment can be a good solution. (I understand the S4 point of view, but this package is dedicated to end-users, so the elegance in design and vocabulary has them as our priority)

Having said that, at the BioC-Asia2020 we will gather the opinion of the community with a pool at our workshop.

  • Title: field opening quote not closed; is it necessary

  • Description: Not sure what an 'invisible layer' is as a programming concept?

  • Imports: not all packages listed here are actually used directly? Remove packages that provide functionality that is not directly referenced.

TESTS

  • consider organising test files so that they have the same structure as the R/ files -- each tests/testthat/test-* is testing code defined in the corresponding R/* file. This organization makes it easier to navigate the code.

  • hmm, I find the coding style here very difficult to read, and wonder why it does not follow tidyverse guidelines? https://style.tidyverse.org/pipes.html#whitespace .

  • Likewise with tests, it would seem that nesting pipes within expect_() is antithetic to readability -- instead, create a temporary variable from a piped expresssion, and use the variable in expect_() statement?

I opted for full tydyverse style without the need to create temporary variables

  • test-dplyr.R:16 It is never appropriate to use direct slot access; use accessors instead.

R (comments are on specfic lines of code, but apply throughout)

  • data.R I was surprised that these data sets were not documented here, where they are defined? The documentation in man/pbmc_small.Rd, for instance, is inadequate -- provide a description of what the object represents, how it was derived, how it is used in the package.

  • dplyr_methods.R:58 Redefining the generic is not appropriate. Instead, define S3 methods that implement functionality for the specific classes you wish to provide methods for, and export the methods. Since you expect the user to use the dplyr generics, place dplyr in the Depends: field of the DESCRIPTION file.

I worked on it, and I agree. Respecting this standard is on out to-do list for tidySCE, tidySE and tidybulk. However I was not able to make things work under this standard this branch (either the export from roxigen disappears, or the method.class does not get exported). I ask to give us a release cycle to fix this. This although is a better approach does not affect negatively when using any package.

  • dplyr_methods.R:157 I find the mix of 'base' and tidy functionality confusing; why use cbind() rather than the tidy equivalent? In the next line,

I used cbind to cbind two SingleCellExperiment objects

  • dplyr_methods.R:189 Never use direct slot access @, instead use accessors colData(). It's strange to see as.data.frame() in this line, rather than as_tibble(), and likewise creating the second argument as a DataFrame()???

tts[[1]] is a SingleCellExperiment; colData(tts[[1]]) is a DataFrame that cannot be converted to tibble directly; DataFrame is used to update colData from SCE

  • dplyr_methods.R:224 This formatting

message("tidySCE says: A data frame is returned for
independent data analysis.")
results in quasi-arbitrary formatting for output, where the white space introduced for programmatic convenience is echoed literally to the user

tidySCE says: A data frame is returned for
independent data analysis.
Rely on message()'s internal use of paste0() or, if precise formatting is important, use strwrap()

message(
"tidySCE says: A data frame is returned for ",
"independent data analysis."
)
or (presummable in a helper function for re-use across your code)

txt <- paste(
"tidySCE says: A data frame is returned for ",
"independent data analysis."
)
message(paste(strwrap(txt, exdent = 4), collapse = "\n"))

  • dplyr_methods.R:622 long lines of piped code and nested conditional tests are unreadable. Reformat, e.g.,

tst <- intersect(
cols %>%
names(),
get_special_columns(.data) %>%
c(get_needed_columns())) %>%
length() %>%
gt(0)
)
if (tst) {

  • (why do you write c(get_needed_columns())) instead of get_needed_columns()?)

this is a concatenation "a" %>% c("b")

  • In the stop() message immediately after, reconsider use of stop(sprintf()) and instead rely on stop()'s internal use of paste0()

columns <-
get_special_columns(.data) %>%
c(get_needed_columns()) %>%
paste(collapse=", ")
stop(
"tidySCE says: you are trying to rename a column that is view only ",
columns, " ",
"(it is not present in the colData). If you want to mutate a view-only",
"column, make a copy and mutate that one.",
)

  • For such long messages adopting a strwrap() discipline as outlined above would seem to be highly desireable.

For the moment we have adopted the stop paste0 function suggested above. In the future this will be an area of improvement.

  • dplyr_methods.R:775 this code is repeated several times. Write a function instead to remove code duplication.

the code is similar but not identical

  • dplyr_methods.R:1146 it should not be necessary to preface functions with their package names; I don't think this helps readability of the code.

I understan the concern, however since I am overwriting some tidyverse function this helps some inconsistencies. We plan to improve the code clarity as the pakage matures

  • methods.R:3 it's strange to see a class definition at the top of a file called 'methods.R'; a more common organization would give the file the name of the class -- tidySCE.R. The Bioconductor / S4 convention (this is an S4 class, after all...) would name this class TidySCE, and perhaps further recognizing the imporance of being clear to the user and that the user will seldom have to type out the word in it's entirety (e.g., because of autocompletion in RStudio) it would be better to name the class explicitly TidySingleCellExperiment.

We agree, we wait to create this file once the package name has been settled (first point in the review list).

  • methods.R:106 highly nested calls like this are just too complicated for the user to parse. Adopt a more procedural programming style with less nesting.

MAN

  • not sure what the figures/ directory is doing here?

It's used for the figures in the README, as here: https://r-pkgs.org/whole-game.html

@stemangiola
Copy link
Owner Author

stemangiola commented Oct 7, 2020

Hello @mblue9 I have complied (or replied) to most of the Martin's comments both for SCE, SE, and Seurat packages.

If you feel and have time to take case of the

  • data.R or
  • MAN

issue for those three packaes would be great. No pressure though, if you are busy no problem at all.

@mblue9
Copy link
Collaborator

mblue9 commented Oct 7, 2020

Ok I can take a look tonight or tomorrow if that's ok.

For man that figures directory is needed for the github README I think?

For data - provide a description of 1) what the object represents,2) how it was derived, 3 )how it is used in the package
I could do 1) and 3) but I don't know how it was derived do you have info on that?

@stemangiola
Copy link
Owner Author

stemangiola commented Oct 8, 2020

For man that figures directory is needed for the github README I think?

yes is there any more preferrable location that the community suggests?

but I don't know how it was derived do you have info on that?

I will give it to you today. I have to remember :) I think is derived from seurat or signlecellexperiment tutorials

found https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html

@stemangiola
Copy link
Owner Author

Hello @mblue9 just really a question, I don't want to put pressure as I know you are busy. Do you have a time line for those three points we discussed.

If you are too busy I will sort them out no problem.

I was thinking about resubmitting this weekend.

@mblue9
Copy link
Collaborator

mblue9 commented Oct 9, 2020

Sorry I’ve been tied up with other work I can do it tomorrow if that’s not too late?

@stemangiola
Copy link
Owner Author

No prob. Tomorrow is great. I will focus on the other missing points then.

Thanks!

@mblue9
Copy link
Collaborator

mblue9 commented Oct 10, 2020

Working on this now

For the man/figures

Storing images in man/figures is mentioned in CRAN here (pasted below): https://cran.r-project.org/submit.html

2.7 Figures
To include figures in help pages, use the \figure markup. There are three forms.

The two commonly used simple forms are \figure{filename} and \figure{filename}{alternate text}. This will include a copy of the figure in either HTML or LaTeX output. In text output, the alternate text will be displayed instead. (When the second argument is omitted, the filename will be used.) Both the filename and the alternate text will be parsed verbatim, and should not include special characters that are significant in HTML or LaTeX.

The expert form is \figure{filename}{options: string}. (The word ‘options:’ must be typed exactly as shown and followed by at least one space.) In this form, the string is copied into the HTML img tag as attributes following the src attribute, or into the second argument of the \Figure macro in LaTeX, which by default is used as options to an \includegraphics call. As it is unlikely that any single string would suffice for both display modes, the expert form would normally be wrapped in conditionals. It is up to the author to make sure that legal HTML/LaTeX is used. For example, to include a logo in both HTML (using the simple form) and LaTeX (using the expert form), the following could be used:

\if{html}{\figure{Rlogo.svg}{options: width=100 alt="R logo"}}
\if{latex}{\figure{Rlogo.pdf}{options: width=0.5in}}
The files containing the figures should be stored in the directory man/figures. Files with extensions .jpg, .jpeg, .pdf, .png and .svg from that directory will be copied to the help/figures directory at install time. (Figures in PDF format will not display in most HTML browsers, but might be the best choice in reference manuals.) Specify the filename relative to man/figures in the \figure directive.

It's used for figures in the README by the R community here: https://r-pkgs.org/whole-game.html

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

and mentioned by Hadley here: r-lib/pkgdown#280 (comment)

Is one of these refs good enough?

This was linked to pull requests Oct 10, 2020
@stemangiola
Copy link
Owner Author

Is one of these refs good enough?

That's amazing. Feel free to add this text below Martin's comment and you can tick the box as done ;)

@mblue9
Copy link
Collaborator

mblue9 commented Oct 10, 2020

Ok I will write something under that comment. I will also test the removing the :: now.

By the way the way for your response above

I opted for full tydyverse style without the need to create temporary variables

Are you sure it's following tidyverse style? as there are recommendations for when it's better to not use the pipe e.g. for readability/communication:

https://r4ds.had.co.nz/pipes.html#when-not-to-use-the-pipe

Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names. That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent.

@stemangiola
Copy link
Owner Author

Your pipes are longer than (say) ten steps.

Wow, I would have said 1000 at least

@stemangiola
Copy link
Owner Author

By the way the way for your response above

For this question Martin's is referring to the tests. There the number of lines is quite small.

@stemangiola
Copy link
Owner Author

@mblue9 please let me know when you feel you are ready on your side. Today I will share this issue with Martin's for tidySCE and tidySE.

@mblue9
Copy link
Collaborator

mblue9 commented Oct 11, 2020

I added a comment for man/figures above. pasilla data.R in tidySE needs info I could do that later as have to head out now. Or feel free to add it if you don't want to wait.

@stemangiola
Copy link
Owner Author

I added a comment for man/figures above. pasilla data.R in tidySE needs info I could do that later as have to head out now. Or feel free to add it if you don't want to wait.

I have added it, and replied to the Bioconductor reviewers. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants