Align fastq and fasta reads to virus DNA, and output a ranked score of viruses detected in the dataset.
Installation of tools:
- bwa (https://github.com/lh3/bwa)
- samtools (https://github.com/samtools/samtools)
The scripts below assume that both tools are compiled and executable in the ./bin
directory.
You can use any virus file, but using the one from https://rvdb.dbi.udel.edu in this example.
mkdir -p virus
cd ./virus
wget https://rvdb.dbi.udel.edu/download/U-RVDBv17.0.fasta.gz
cd ..
./bin/bwa index ./virus/U-RVDBv17.0.fasta.gz
Put the source *.fastq.gz
files in the ./data
folder
./bwa/bwa mem ./virus/U-RVDBv17.0.fasta.gz ./data/reads-1.fastq.gz [./data/reads-2.fastq.gz] > ./data/reads-aligned.virus.sam
./bin/samtools view -S -b ./data/reads-aligned.virus.sam > ./data/reads-aligned.virus.bam
rm ./data/reads-aligned.virus.sam
samtools sort -o ./data/reads-aligned.virus.sorted.bam ./data/reads-aligned.virus.bam
rm ./data/reads-aligned.virus.bam
samtools index -b ./data/reads-aligned.virus.sorted.bam ./data/reads-aligned.virus.sorted.bam.bai
samtools idxstats ./data/reads-aligned.virus.sorted.bam > ./data/reads-aligned.virus.sorted.idxstats.csv
py ./bin/viralign-sort.py
Your sorted output file is at ./data/reads-aligned.virus.sorted.idxstats.viralign.csv
This script is developed by Onno Faber and comes without warranty of any kind. Use at your own risk. Initially developed for https://www.researchtothepeople.org/epithelioid-sarcoma. Thank you to all participants and organizers of these wonderful events. If you'd like to donate to Research to the People, visit https://www.researchtothepeople.org/donate
If you'd like to contribute, please reach out and/or set up a pull request.
I'm currently working on an environment to create a hosted version of this (and potentially other pipelines). That way nobody would have to install any environment themselves and can run open source projects like these completely wihtout any technical knowledge. If you'd like to learn more please email me at [email protected].
Visit https://rarematter.org to see other projects in the healthcare and rare disease space I'm working on.