HPViewer is a tool for genotyping and quantification of HPV from metagenomic or human genomic shotgun sequencing data. We designed it to improve performance by masking nonspecific sequences from reference genomes and directly identifying HPV short DNA reads. It contains two HPV databases with different masking strategies, repeat-mask and homology-mask and one homology distance matrix to choose between those two databases.
If you use the HPViewer software, please cite our manuscript:
Yuhan Hao, Liying Yang, Antonio Galvao Neto, Milan R Amin, Dervla Kelly, Stuart M Brown, Ryan C Branski, Zhiheng Pei; HPViewer: sensitive and specific genotyping of human papillomavirus in metagenomic DNA, Bioinformatics, bty037, https://doi.org/10.1093/bioinformatics/bty037
$ git clone https://github.com/yuhanH/HPViewer.git
Python (2.7+)
Python packages (sys, getopt, subprocess)
Bowtie2: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
SAMtools: http://www.htslib.org/
Bedtools: http://bedtools.readthedocs.io/en/latest/
a) input files (-U or -1 -2): fastq files (or fastq.gz), unpaired (-U unpaired.fastq) or R1,R2 paired (-1 R1.fastq -2 R2.fastq)
b) output file name (-o)
a) database mask type (-m): hybrid-mask(default), repeat-mask, homology-mask.
If you set -m, it should be in front of reads input (-m repeat-mask -1 R1.fastq -2 R2.fastq). Repeat-mask is a more sensitive mode; and homology-mask is suggested when some types of HPV are present in large abundance which may lead to false positive of other types of HPV.
b) number of threaded used in bowtie2 alignment (-p)
c) minimal coverage threshold to determine HPV present (-c), default is 150 bp (1.5 x average length of your reads).
a) output_HPV_summary.txt has three coloumns with types of HPV present, number of reads per kilobase (RPK) for the matching HPV, and number of reads of the matching HPV.
b) alignment results after bowtie2: output.sam, output.bam
python HPViewer.py -U test_unpaired.fastq -o TEST
more TEST/TEST_HPV_profile.txt