-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README according to https://github.com/nf-core/tools/issues/2186 #946
Changes from 3 commits
efc01cc
0ec0feb
04a206c
6f998a4
1fae451
05288e1
f579681
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -12,25 +12,10 @@ | |||||||||
|
||||||||||
## Introduction | ||||||||||
|
||||||||||
**nf-core/rnaseq** is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. | ||||||||||
|
||||||||||
On release, automated continuous integration tests run the pipeline on a [full-sized dataset](https://github.com/nf-core/test-datasets/tree/rnaseq#full-test-dataset-origin) obtained from the ENCODE Project Consortium on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from running the full-sized tests individually for each `--aligner` option can be viewed on the [nf-core website](https://nf-co.re/rnaseq/results) e.g. the results for running the pipeline with `--aligner star_salmon` will be in a folder called `aligner_star_salmon` and so on. | ||||||||||
|
||||||||||
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! | ||||||||||
|
||||||||||
## Online videos | ||||||||||
|
||||||||||
A short talk about the history, current status and functionality on offer in this pipeline was given by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) on [8th February 2022](https://nf-co.re/events/2022/bytesize-32-nf-core-rnaseq) as part of the nf-core/bytesize series. | ||||||||||
|
||||||||||
You can find numerous talks on the [nf-core events page](https://nf-co.re/events) from various topics including writing pipelines/modules in Nextflow DSL2, using nf-core tooling, running nf-core pipelines as well as more generic content like contributing to Github. Please check them out! | ||||||||||
|
||||||||||
## Pipeline summary | ||||||||||
**nf-core/rnaseq** is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet and FASTQ files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report. | ||||||||||
|
||||||||||
![nf-core/rnaseq metro map](docs/images/nf-core-rnaseq_metro_map_grey.png) | ||||||||||
|
||||||||||
> **Note** | ||||||||||
> The SRA download functionality has been removed from the pipeline (`>=3.2`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline rnaseq` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline. | ||||||||||
|
||||||||||
1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html)) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a couple of sections because everything looks bunched up otherwise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we need a section both above and below the tube map... the tube map would fit well under both "introduction" and "pipeline overview" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Gregor, going that route for the template update for the moment. Feel frre to reject the PR, @drpatelh :D |
||||||||||
2. Sub-sample FastQ files and auto-infer strandedness ([`fq`](https://github.com/stjude-rust-labs/fq), [`Salmon`](https://combine-lab.github.io/salmon/)) | ||||||||||
3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) | ||||||||||
|
@@ -56,44 +41,52 @@ You can find numerous talks on the [nf-core events page](https://nf-co.re/events | |||||||||
15. Pseudo-alignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/); _optional_) | ||||||||||
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/)) | ||||||||||
|
||||||||||
> **Note** | ||||||||||
> The SRA download functionality has been removed from the pipeline (`>=3.2`) and ported to an independent workflow called [nf-core/fetchngs](https://nf-co.re/fetchngs). You can provide `--nf_core_pipeline rnaseq` when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline. | ||||||||||
|
||||||||||
> **Warning** | ||||||||||
> Quantification isn't performed if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2. | ||||||||||
|
||||||||||
## Quick Start | ||||||||||
## Usage | ||||||||||
|
||||||||||
1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=22.10.1`) | ||||||||||
> **Note** | ||||||||||
grst marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
> If you are new to nextflow and nf-core, please refer to [this page](https://nf-co.re/#TODO) on how to set-up nextflow. | ||||||||||
grst marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) (you can follow [this tutorial](https://singularity-tutorial.github.io/01-installation/)), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(you can use [`Conda`](https://conda.io/miniconda.html) both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_. Note: This pipeline does not currently support running with Conda on macOS if the `--remove_ribo_rna` parameter is used because the latest version of the SortMeRNA package is not available for this platform. | ||||||||||
First, you need to prepare a samplesheet with your input data that looks as follows: | ||||||||||
grst marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
3. Download the pipeline and test it on a minimal dataset with a single command: | ||||||||||
**samplesheet.csv**: | ||||||||||
|
||||||||||
```bash | ||||||||||
nextflow run nf-core/rnaseq -profile test,YOURPROFILE --outdir <OUTDIR> | ||||||||||
``` | ||||||||||
```csv | ||||||||||
sample,fastq_1,fastq_2,strandedness | ||||||||||
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,auto | ||||||||||
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,auto | ||||||||||
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto | ||||||||||
``` | ||||||||||
|
||||||||||
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string. | ||||||||||
Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and will be automatically inferred if set to `auto`. | ||||||||||
|
||||||||||
> - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`. | ||||||||||
> - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. | ||||||||||
> - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. | ||||||||||
> - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. | ||||||||||
Now, you can run the pipeline using: | ||||||||||
|
||||||||||
4. Start running your own analysis! | ||||||||||
```bash | ||||||||||
nextflow run nf-core/rnaseq \ | ||||||||||
--input samplesheet.csv \ | ||||||||||
--outdir <OUTDIR> \ | ||||||||||
--genome GRCh37 \ | ||||||||||
-profile <docker/singularity/.../institute> | ||||||||||
``` | ||||||||||
|
||||||||||
```bash | ||||||||||
nextflow run nf-core/rnaseq --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> | ||||||||||
``` | ||||||||||
For more details, please refer to the [usage documentation](https://nf-co.re/rnaseq/3.10.1/usage) and the [parameter documentation](https://nf-co.re/rnaseq/3.10.1/parameters). | ||||||||||
grst marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
- An executable Python script called [`fastq_dir_to_samplesheet.py`](https://github.com/nf-core/rnaseq/blob/master/bin/fastq_dir_to_samplesheet.py) has been provided if you would like to auto-create an input samplesheet based on a directory containing FastQ files **before** you run the pipeline (requires Python 3 installed locally) e.g. | ||||||||||
## Pipeline output | ||||||||||
|
||||||||||
```bash | ||||||||||
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py | ||||||||||
./fastq_dir_to_samplesheet.py <FASTQ_DIR> samplesheet.csv --strandedness reverse | ||||||||||
``` | ||||||||||
The output of the pipeline applied to a [full-sized example dataset](https://github.com/nf-core/test-datasets/tree/rnaseq#full-test-dataset-origin) can be found [here](https://nf-co.re/rnaseq/results). | ||||||||||
For more details, please refer to the [output documentation](https://nf-co.re/rnaseq/3.10.1/output). | ||||||||||
grst marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
## Documentation | ||||||||||
## Online videos | ||||||||||
|
||||||||||
The nf-core/rnaseq pipeline comes with documentation about the pipeline [usage](https://nf-co.re/rnaseq/usage), [parameters](https://nf-co.re/rnaseq/parameters) and [output](https://nf-co.re/rnaseq/output). | ||||||||||
A short talk about the history, current status and functionality on offer in this pipeline was given by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) on [8th February 2022](https://nf-co.re/events/2022/bytesize-32-nf-core-rnaseq) as part of the nf-core/bytesize series. | ||||||||||
|
||||||||||
You can find numerous talks on the [nf-core events page](https://nf-co.re/events) from various topics including writing pipelines/modules in Nextflow DSL2, using nf-core tooling, running nf-core pipelines as well as more generic content like contributing to Github. Please check them out! | ||||||||||
|
||||||||||
## Credits | ||||||||||
|
||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call it Tube map, 'Workflow Diagram' or something would be better