Skip to content

kojix2/ruby-htslib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ruby-htslib

Gem Version test The MIT License DOI Docs Stable

Ruby-htslib is the Ruby bindings to HTSlib, a C library for high-throughput sequencing data formats. It allows you to read and write file formats commonly used in genomics, such as SAM, BAM, VCF, and BCF, in the Ruby language.

🍎 Feel free to fork it!

Requirements

  • Ruby 3.1 or above.
  • HTSlib
    • Ubuntu : apt install libhts-dev
    • macOS : brew install htslib
    • Windows : mingw-w64-htslib is automatically fetched when installing the gem (RubyInstaller only).
    • Build from source code (see the Development section)

Installation

gem install htslib

If you have installed htslib with apt on Ubuntu or homebrew on Mac, pkg-config will automatically detect the location of the shared library. If pkg-config does not work well, set PKG_CONFIG_PATH. Alternatively, you can specify the directory of the shared library by setting the environment variable HTSLIBDIR.

export HTSLIBDIR="/your/path/to/htslib" # Directory where libhts.so is located

ruby-htslib also works on Windows. If you use RubyInstaller, htslib will be prepared automatically.

Usage

HTS::Bam - SAM / BAM / CRAM - Sequence Alignment Map file

Reading fields

require 'htslib'

bam = HTS::Bam.open("test/fixtures/moo.bam")

bam.each do |r|
  pp name: r.qname,
     flag: r.flag,
     chrm: r.chrom,
     strt: r.pos + 1,
     mapq: r.mapq,
     cigr: r.cigar.to_s,
     mchr: r.mate_chrom,
     mpos: r.mpos + 1,
     isiz: r.isize,
     seqs: r.seq,
     qual: r.qual_string,
     MC:   r.aux("MC")
end

bam.close

With a block

HTS::Bam.open("test/fixtures/moo.bam") do |bam|
  bam.each do |r|
    puts r.to_s
  end
end

HTS::Bcf - VCF / BCF - Variant Call Format file

Reading fields

require 'htslib'

bcf = HTS::Bcf.open("test/fixtures/test.bcf")

bcf.each do |r|
  p chrom:  r.chrom,
    pos:    r.pos,
    id:     r.id,
    qual:   r.qual.round(2),
    ref:    r.ref,
    alt:    r.alt,
    filter: r.filter,
    info:   r.info.to_h,
    format: r.format.to_h
end

bcf.close

With a block

HTS::Bcf.open("test/fixtures/test.bcf") do |bcf|
  bcf.each do |r|
    puts r.to_s
  end
end

HTS::Faidx - FASTA / FASTQ - Nucleic acid sequence

fa = HTS::Faidx.open("test/fixtures/moo.fa")
fa.seq("chr1:1-10") # => CGCAACCCGA # 1-based
fa.close

HTS::Tabix - GFF / BED - TAB-delimited genome position file

tb = HTS::Tabix.open("test/fixtures/test.vcf.gz")
tb.query("poo", 2000, 3000) do |line|
  puts line.join("\t")
end
tb.close

Low-level API

Middle architectural layer between high-level Ruby code and low-level C code. HTS::LibHTS provides native C functions using Ruby-FFI.

require 'htslib'

a = HTS::LibHTS.hts_open("a.bam", "r")
b = HTS::LibHTS.hts_get_format(a)
p b[:category]
p b[:format]

The low-level API makes it possible to perform detailed operations, such as calling CRAM-specific functions.

Macro functions

HTSlib is designed to improve performance with many macro functions. However, it is not possible to call C macro functions directly from Ruby-FFI. To overcome this, important macro functions have been re-implemented in Ruby, allowing them to be called in the same way as native functions.

Garbage Collection and Memory Freeing

A small number of commonly used structs, such as Bam1 and Bcf1, are implemented using FFI's ManagedStruct. This allows for automatic memory release when Ruby's garbage collection is triggered. On the other hand, other structs are implemented using FFI::Struct, and they will require manual memory release.

Need more speed?

Try Crystal. HTS.cr is implemented in Crystal language and provides an API compatible with ruby-htslib.

Documentation

Development

Compile from source code

GNU Autotools is required to compile htslib. To get started with development:

git clone --recursive https://github.com/kojix2/ruby-htslib
cd ruby-htslib
bundle install
bundle exec rake htslib:build
bundle exec rake test

Macro functions are reimplemented

HTSlib has many macro functions. These macro functions cannot be called from FFI and must be reimplemented in Ruby.

Use the latest Ruby

Use Ruby 3 or newer to take advantage of new features. This is possible because we have a small number of users.

Keep compatibility with Crystal language

Compatibility with Crystal language is important for Ruby-htslib development.

  • HTS.cr - HTSlib bindings for Crystal

Return value

The most challenging part is the return value. In the Crystal language, methods are expected to return only one type. On the other hand, in the Ruby language, methods that return multiple classes are very common. For example, in the Crystal language, the compiler gets confused if the return value is one of six types: Int32, Int64, Float32, Float64, Nil, or String. In fact Crystal allows you to do that. But the code gets a little messy. In Ruby, this is very common and doesn't cause any problems.

Memory management

Ruby and Crystal are languages that use garbage collection. However, the memory release policy for allocated C structures is slightly different: in Ruby-FFI, you can define a self.release method in FFI::Struct. This method is called when GC. So you don't have to worry about memory in high-level APIs like Bam::Record or Bcf::Record, etc. Crystal requires you to define a finalize method on each class. So you need to define it in Bam::Record or Bcf::Record.

Macro functions

In ruby-htslib, C macro functions are added to LibHTS, but in Crystal, LibHTS is a Lib, so methods cannot be added. methods are added to LibHTS2.

Naming convention

If you are not sure about the naming of a method, follow the Rust-htslib API. This is a very weak rule. if a more appropriate name is found later in Ruby, it will replace it.

Support for bitfields of structures

Since Ruby-FFI does not support structure bit fields, the following extensions are used.

  • ffi-bitfield - Extension of Ruby-FFI to support bitfields.

Automatic validation

In the script directory, there are several tools to help implement ruby-htslib. Scripts using c2ffi can check the coverage of htslib functions in Ruby-htslib. They are useful when new versions of htslib are released.

  • c2ffi is a tool to create JSON format metadata from C header files.

Contributing

Ruby-htslib is a library under development, so even minor improvements like typo fixes are welcome! Please feel free to send us your pull requests.

# Ownership and Commit Rights

Do you need commit rights to the ruby-htslib repository?
Do you want to get admin rights and take over the project?
If so, please feel free to contact us @kojix2.

Why do you implement htslib in a language like Ruby, which is not widely used in bioinformatics?

One of the greatest joys of using a minor language like Ruby in bioinformatics is that nothing stops you from reinventing the wheel. Reinventing the wheel can be fun. But with languages like Python and R, where many bioinformatics masters work, there is no chance for beginners to create htslib bindings. Bioinformatics file formats, libraries, and tools are very complex, and I need to learn how to understand them. So I started to implement the HTSLib binding myself to better understand how the pioneers of bioinformatics felt when establishing the file format and how they created their tools. I hope one day we can work on bioinformatics using Ruby and Crystal languages, not to replace other languages such as Python and R, but to add new power and value to this advancing field.

Links

Funding support

This work was supported partially by Ruby Association Grant 2020.

License

MIT License.