This approach builds upon the mitoVGP pipeline (https://github.com/gf777/mitoVGP) to assemble, polish, circularize, qv check, and annotate Ag100Pest mt genomes frm CLR + Illumina data.
mitoVGP
https://github.com/gf777/mitoVGP
Formenti, G., Rhie, A., Balacco, J. et al. Complete vertebrate mitogenomes reveal widespread repeats and gene duplications. Genome Biol 22, 120 (2021). https://doi.org/10.1186/s13059-021-02336-9
conda env create -f mitoVGP_conda_env_pacbio.yml --prefix /project/ag100pest/software/modified_mitoVGP/pacbio_mitoVGP
mitofinder
https://github.com/RemiAllio/MitoFinder
Allio, R., Schomaker-Bastos, A., Romiguier, J., Prosdocimi, F., Nabholz, B., & Delsuc, F. (2020) Mol Ecol Resour. 20, 892-905. (publication link)
PB_list.txt
: List of long readsI_list.txt
: List of shotgun short reads<reference>.fasta
: Mitochnodrial genome of a closely related species. In this case, Sitotroga cerealella mitochondrion, complete genome<reference>.gb
: Mitochnodrial genbank of a closely related species. In this case, Sitotroga cerealella mitochondrion, complete genome
Mercury and Meryl
Export software directory to your path. Add this to .bashrc to
export PATH="/project/ag100pest/software/modified_mitoVGP/:$PATH"
Download reference.fasta and refererence.gb from a closely related species.
get_ref.sh <MT ACCESSION ID>
Prepare expected inputs I_list.txt for Illumina Polishing and PB_list.txt with subreads.bam. Eg ls /project/ag100pest/Illumina_polishing/JAMU*{R1,R2}.fastq.bz2 > I_list.txt
submit_
Reads are assembled if they align against the reference mitogenome, so best to have a complete assembly.
Find closely related mitogenomes at https://goat.genomehubs.org/
Pull fasta or gb with EDirect suite from NCBI
esearch -db nucleotide -query “<Accession #>” | efetch -format fasta >out.fasta
We use the mitoVGP approach to assemble, polish, and do preliminary trimming. Submit with sbatch submit_MTAsmPolishCLR.slurm
which contains the shell script run_mitoVGP.sh
. Currently this is hard-coded with species and parameters.
$mitoVGPdir/scripts/01_mtDNApipe
1. identifies MT-like reads by aligning long reads to reference sequence 2. assembles contigs with canu/1.8$mitoVGPdir/scripts/02_blastMT
identifies the best assembled mitocontig with blast$mitoVGPdir/scripts/03_mitoPolish
polishes mitocontig with two rounds of Arrow using the same reads used by Canu$mitoVGPdir/scripts/04_map10x1
polishes mitocontig with short reads by aligning to the mitocontig$mitoVGPdir/scripts/05_trimmer
generates a trimmed version of the Canu mitocontig after short read polishing (map10x1). The resulting contig has 100 bp overlapping ends on both sides.
- If you need to span a long repetitive region, increase the mummer sensitivity
-z 500
to-z 5000
(or more?) - If the overlapping sequences at both ends of the contig are perfectly identical, then show-coords doesn't print BEGIN and END. The grep fails and kicks out an error message, "trimming failed; trimming with MitoFinder". Then we use the circularization approach of mitofinder, which uses a blast self-alignment to identify overlap instead. 6. Another round of polishing to clean up trimmed ends. This is not implemented yet. Skip if qv doesn't find "false kmers".
MitoFinder. Annotations are found in <job_name>
_MitoFinder_mitfi_Final_Results
summary.sh prints results from mitofinder