This pipeline is designed to process raw data from EM-seq experiments. It is based on the nf-core framework.
- Install
Ensure also on your system you have installed fastqc, multiqc, trim galore and bismark.
- Install
- Install
- Install
trim galore
- Install
git clone [`EMseq pipeline`](
Before running the pipeline ensure you have the following files in the data
genome_test directory which contains TMEB117_chr16.fasta reference genome
high yield and low yield fastq files
After ensuring everything is in place, activate the conda environment which contains fastqc, multiqc, trim galore and bismark dependencies.
conda activate EMseq # activate conda environment
Before running the script, first run the QC pipeline to check the quality of the fastq file.
** Note: cd into scripts folder **
nextflow run
After running the QC pipeline, run the EMseq pipeline to align the reads to the reference genome and generate methylation calls.
nextflow run
** Note: Adjust trim galore parameters according to the fastqc results and then run the pipeline. **
The pipeline will generate the following files directories in the output directory:
Both high yield and low yield directories which will contain individual fastqc reports of the fastq files.
Both high yield and low yield directories which will contain multiqc reports of the fastqc reports.
Both high yield and low yield directories which will contain trimmed fastq files.
Both high yield and low yield directories which will contain bismark alignment reports.
Both high yield and low yield directories which will contain bismark methylation calls.
QC pipeline: This pipeline will generate fastqc reports of the fastq files and a multiqc report of the fastqc reports.
EMseq pipeline: This pipeline will align the reads to the reference genome and generate methylation calls.