Skip to content
lomereiter edited this page Jun 24, 2012 · 8 revisions

Welcome to the Sambamba wiki!

About

Sambamba is a library plus a set of command-line tools for working with SAM/BAM file formats which are used for storing next generation sequencing (NGS) datasets. The ultimate goal is to provide a flexible and easy-to-use library, yet with very solid performance, taking advantage of modern multi-core CPUs. The code base is written in functional style, and is designed to be easily maintainable, correct, pragmatic, and something to build other functionality on.

Currently the library provides full BAM reading support and BAM/SAM output, at speeds comparable to samtools.

A number of functions are production ready, see sambamba tool documentation (reading SAM will be implemented soon, see 'samragel' branch)

How to use the library

Sambamba compiles into shared libraries and command line binaries. Here you can find a few tutorials on how to work with the library and tools.

The first thing to do it to read getting started page. It will tell you basic things like how to install the library, and how to compile sample code. The next step is to learn how to access alignment records and work with them.

Once you've got acquainted with reading BAM and modifying records, you might want to use this library in your pipeline. For that, you have to know how to print records, and how to save them to a file. Both SAM and BAM output are quite easy, see corresponding pages.

Command-line tools

Also, some command-line tools are supplied with the library. You can find them in CLItools/ directory. If you're a developer, they can serve as real-world examples of how to use the library. If you're a bioinformatician, you can assess their speed and see if it's worth it to use them in your pipelines.


Author

The author of the project is Artem Tarasov a.k.a. lomereiter. In case you have any questions or suggestions, you can either create an issue on Github or contact me via e-mail lomereiter at gmail dot com.

Acknowledgements

This is a Google Summer of Code project, and I wish to thank Open Bioinformatics Foundation, and all Bio* communities, for the initial project idea and their continuous support. The project wouldn't have been born if not for them.