Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify infer strandedness from current subsampling + infer step #1095

Closed
ewallace opened this issue Oct 17, 2023 · 8 comments · Fixed by #1307
Closed

Clarify infer strandedness from current subsampling + infer step #1095

ewallace opened this issue Oct 17, 2023 · 8 comments · Fixed by #1307

Comments

@ewallace
Copy link

ewallace commented Oct 17, 2023

Description of feature

The current nf-core/rnaseq (3.12.0) has initial steps to infer strandedness by first subsample fq, then Salmon to infer strandedness. This is an optional step and has led to some confusion as it's not actually subsampling all the reads.

On a nf-core slack discussion, @drpatelh suggested:

Maybe subsample + infer need to be part of the same station. I think we chose to do it this way because it would have meant introducing more lines and curves to the map which would make it even more confusing. Can you create an issue for this please.

The suggestion is to combine into one station / one module or workflow step. That would clean up the metro diagram and avoid the confusion.

This could be called

  • "Infer strandedness (fq, Salmon)" in the metro diagram
  • "Auto-infer strandedness by subsampling and pseudoalignment (fq, Salmon)" in the list of steps.
@MatthiasZepper
Copy link
Member

To even complicate matters, the most recent release of BBTools (39.03) now also contains a new tool to infer strandedness called checkstrand.sh.

I have not done any comprehensive evaluation, but it has a samplerate=1.0 parameter and can also stop preliminarily after a fixed number of reads reads=-1. Since it is a one-stop shop written by a reputable author, I believe, chances are that it is way faster than the current subworkflow?

@drpatelh drpatelh added this to the 3.15.0 milestone May 13, 2024
maxulysse added a commit to maxulysse/nf-core_rnaseq that referenced this issue May 29, 2024
@pinin4fjords pinin4fjords linked a pull request May 30, 2024 that will close this issue
11 tasks
@pinin4fjords
Copy link
Member

@ewallace - does #1307 fix things for you?

@ewallace
Copy link
Author

@pinin4fjords thanks, yes, that looks ideal! Very clear.

@ewallace
Copy link
Author

The new subway map is labeled (Salmon, fq) - I agree that the Salmon is more important than the fq, but fq happens before subsampling, so you may wish to switch the order in which they are written on the subway map depending on your goals.

@pinin4fjords
Copy link
Member

ping @maxulysse !

@maxulysse
Copy link
Member

done in #1307

@maxulysse
Copy link
Member

yeah, I saw the comment and modified my PR in accordance, and then said I've done it

@pinin4fjords
Copy link
Member

yeah, I saw the comment and modified my PR in accordance, and then said I've done it

You were too fast for me, I didn't think you'd already have addressed the comment. All good now :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants