This Nextflow pipeline is designed to process Oxford Nanopore raw signal data (POD5 files) through basecalling and optional demultiplexing steps. It supports both simplex and duplex basecalling modes using Dorado.
The pipeline consists of a single workflow that processes Nanopore POD5 files through several phases:
- A batching phase where POD5 files are grouped into batches to allow parallelized basecalling
- A basecalling phase using Dorado in either simplex or duplex mode
- An optional demultiplexing phase for barcoded samples
- A final conversion phase to generate FASTQ files from BAM output
The workflow produces the following key outputs:
raw/
: Directory containing the final FASTQ filessummary/
: Directory containing basecalling summary statistics
- Install Nextflow (23.04.0+)
- Install Docker
- Set up AWS BATCH
- Clone this repository
Basic usage:
nextflow run main.nf -profile <profile_name> --pod_5_dir <path_to_pod5_files> --kit <kit_name>
Key Parameters:
--pod_5_dir
: Directory containing POD5 files--kit
: Sequencing kit used (e.g., "dna_r10.4.1_e8.2_400bps_sup")--batch_size
: Number of POD5 files per batch (default: 10)--duplex
: Enable duplex basecalling (default: false). Change parameter in config file.--demux
: Enable demultiplexing (default: false). Change parameter in config file.
Common issues and their solutions:
- Insufficient Memory: Increase available memory or reduce batch size
- Missing POD5 Files: Verify input directory path and check if your AWS credentials are set up properly
- Docker Issues: Ensure Docker is running and has sufficient resources