Bamstats

Bamstats is a command line tool written in Go for computing mapping statistics from a BAM file.

Installation instructions

Use one of the following methods to install Bamstats.

Install a released version

The easiest way is to download a pre-compiled binary from Github releases. Here is an example for installing the latest released version on Linux 64bit:

export VERSION=0.3.5 OS=linux ARCH=x86_64 BIN=/usr/local/bin
wget -O - https://github.com/guigolab/bamstats/releases/download/v${VERSION}/bamstats-v${VERSION}-${OS}-${ARCH}.tar.gz | tar xz -C ${BIN} bamstats

Install the latest version with go

The following command will install the latest version from the master branch into $GOPATH:

go get github.com/guigolab/bamstats/cmd/bamstats

Provided statistics

Bamstats can currently compute the following mapping statistics:

general
genome coverage
RNA-seq

General

The general mapping statistics include:

Total number of reads
Number of unmapped reads
Number of mapped reads grouped by number of multimaps (NH tag in BAM file)
Number of mappings
Ratio of mappings vs mapped reads

If the data is paired-end, a section for read-pairs is also reported. In addition to the above metrics, the section contains a map of the insert size length and the corresponding support as number of reads.

Genome coverage

The genome coverage ststistics are computed for RNA-seq data and include counts for the following genomic regions:

exon
intron
exonic_intronic
intergenic
others

The above metrics are computed for continuous and split mapped reads. An aggregated total is computed across elements and read types too.

The --uniq (or -u) command line flag allows reporting of genome coverage statistics for uniquely mapped reads too.

RNA-seq

The RNA-seq statistics follow IHEC reccomendations for RNA-seq data quality metrics. They include counts for the following regions:

intergenic (different from coverage stats)
ribosomal RNA (rRNA)

As long as other fractional metrics for the following read types:

mapped
intergenic
rRNA
duplicates

Output examples:

Some examples of the program output can be found in the data folder ot this GitHub repository:

General Stats
Genomic coverage stats
Genomic coverage stats with uniquely mapped reads (Note that the coverageUniq stats are reported as an additional JSON object)
RNA-seq stats

Please see here for a complete description of the output fields and how they are calculated.

License

This software is release under a BSD-style license. Please check the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.github/workflows		.github/workflows
annotation		annotation
cmd/bamstats		cmd/bamstats
config		config
data		data
sam		sam
scripts		scripts
stats		stats
utils		utils
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
output_fields.md		output_fields.md
process.go		process.go
process_test.go		process_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bamstats

Installation instructions

Install a released version

Install the latest version with go

Provided statistics

General

Genome coverage

RNA-seq

Output examples:

License

About

Releases 9

Packages

Languages

License

guigolab/bamstats

Folders and files

Latest commit

History

Repository files navigation

Bamstats

Installation instructions

Install a released version

Install the latest version with go

Provided statistics

General

Genome coverage

RNA-seq

Output examples:

License

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages