Bamstats
is a command line tool written in Go
for computing mapping statistics from a BAM
file.
Use one of the following methods to install Bamstats
.
The easiest way is to download a pre-compiled binary from Github releases. Here is an example for installing the latest released version on Linux 64bit:
export VERSION=0.3.5 OS=linux ARCH=x86_64 BIN=/usr/local/bin
wget -O - https://github.com/guigolab/bamstats/releases/download/v${VERSION}/bamstats-v${VERSION}-${OS}-${ARCH}.tar.gz | tar xz -C ${BIN} bamstats
The following command will install the latest version from the master branch into $GOPATH
:
go get github.com/guigolab/bamstats/cmd/bamstats
Bamstats
can currently compute the following mapping statistics:
- general
- genome coverage
- RNA-seq
The general mapping statistics include:
- Total number of reads
- Number of unmapped reads
- Number of mapped reads grouped by number of multimaps (
NH
tag inBAM
file) - Number of mappings
- Ratio of mappings vs mapped reads
If the data is paired-end, a section for read-pairs is also reported. In addition to the above metrics, the section contains a map of the insert size length and the corresponding support as number of reads.
The genome coverage ststistics are computed for RNA-seq data and include counts for the following genomic regions:
- exon
- intron
- exonic_intronic
- intergenic
- others
The above metrics are computed for continuous and split mapped reads. An aggregated total is computed across elements and read types too.
The --uniq
(or -u
) command line flag allows reporting of genome coverage statistics for uniquely mapped reads too.
The RNA-seq statistics follow IHEC reccomendations for RNA-seq data quality metrics. They include counts for the following regions:
- intergenic (different from coverage stats)
- ribosomal RNA (
rRNA
)
As long as other fractional metrics for the following read types:
- mapped
- intergenic
- rRNA
- duplicates
Some examples of the program output can be found in the data
folder ot this GitHub repository:
- General Stats
- Genomic coverage stats
- Genomic coverage stats with uniquely mapped reads (Note that the
coverageUniq
stats are reported as an additional JSON object) - RNA-seq stats
Please see here for a complete description of the output fields and how they are calculated.
This software is release under a BSD-style license. Please check the LICENSE
file for more details.