Generating error and qscore models

Badread comes with two error/qscore models: one that I built with Oxford Nanopore reads (MinION, R9.4 flowcell) and one that I built with PacBio reads (PacBio RS II, CLR). If you'd like to build your own model, keep reading!

Requirements:

Long reads (at least a Gbp would be good)
A high-quality reference FASTA (ideally an Illumina-polished assembly of the same genome as the reads came from)
minimap2 (my favourite long read aligner).

First, you must align your long reads to your reference. Make sure to use minimap2's -c option so it includes the CIGAR string in the output:

minimap2 -c -x map-ont reference.fasta.gz reads.fastq.gz | gzip > alignments.paf.gz

Now build the models with Badread (this can take a long time, especially for large read sets):

badread error_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_error_model
badread qscore_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_qscore_model

If it's taking too long or running out of RAM, try limiting the number of alignments used with the --max_alignments option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating error and qscore models

Clone this wiki locally