Skip to content

Commit

Permalink
Add missing licenses and readmes for code in lib #403
Browse files Browse the repository at this point in the history
  • Loading branch information
milot-mirdita committed Feb 11, 2021
1 parent 20543e0 commit 46c26ce
Show file tree
Hide file tree
Showing 10 changed files with 1,436 additions and 92 deletions.
26 changes: 26 additions & 0 deletions lib/base64/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
https://github.com/superwills/NibbleAndAHalf
base64.h -- Fast base64 encoding and decoding.
version 1.0.0, April 17, 2013 143a

Copyright (C) 2013 William Sherif

This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.

William Sherif
[email protected]

YWxsIHlvdXIgYmFzZSBhcmUgYmVsb25nIHRvIHVz
8 changes: 8 additions & 0 deletions lib/base64/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
NibbleAndAHalf
==============

"Nibble And A Half" is an ANSI C library that provides fast base64 encoding and decoding, all in a single header file.

Wed Apr 17 6:13p
- All test related functions moved to testbase64.h. To use, only need #include "base64.h":
https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
674 changes: 674 additions & 0 deletions lib/cacode/LICENSE.LAST

Large diffs are not rendered by default.

26 changes: 26 additions & 0 deletions lib/cacode/LICENSE.NCBI
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
https://github.com/superwills/NibbleAndAHalf
base64.h -- Fast base64 encoding and decoding.
version 1.0.0, April 17, 2013 143a

Copyright (C) 2013 William Sherif

This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.

William Sherif
[email protected]

YWxsIHlvdXIgYmFzZSBhcmUgYmVsb25nIHRvIHVz
2 changes: 2 additions & 0 deletions lib/cacode/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
CA_code was extracted from LAST (http://last.cbrc.jp) which is licensed under GPLv3-or-later (see LICENSE.LAST).
CA_code itself is public domain developed by members of the NCBI (see LICENSE.NCBI).
504 changes: 504 additions & 0 deletions lib/gzstream/LICENSE

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions lib/gzstream/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

gzstream
C++ iostream classes wrapping the zlib compression library.
===========================================================================

Header Only version of this library from:
https://gist.github.com/piti118/1508048
24 changes: 24 additions & 0 deletions lib/ksw2/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
The MIT License

Copyright (c) 2018- Dana-Farber Cancer Institute
2017-2018 Broad Institute, Inc.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
163 changes: 163 additions & 0 deletions lib/ksw2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
## Introduction

KSW2 is a library to align a pair of biological sequences based on dynamic
programming (DP). So far it comes with global alignment and alignment extension
(no local alignment yet) under an affine gap cost function: gapCost(*k*) =
*q*+*k*\**e*, or a two-piece affine gap cost: gapCost2(*k*) = min{*q*+*k*\**e*,
*q2*+*k*\**e2*}. For the latter cost function, if *q*+*e*<*q2*+*e2* and *e*>*e2*,
(*q*,*e*) is effectively applied to short gaps only, while (*q2*,*e2*) applied
to gaps no shorter than ceil((*q2*-*q*)/(*e*-*e2*)-1). It helps to retain long
gaps. The algorithm behind the two-piece cost is close to [Gotoh
(1990)][piece-affine].

KSW2 supports fixed banding and optionally produces alignment paths (i.e.
CIGARs) with gaps either left- or right-aligned. It provides implementations
using SSE2 and SSE4.1 intrinsics based on [Hajime Suzuki][hs]'s diagonal
[formulation][hs-eq] which enables 16-way SSE parallelization for the most part
of the inner loop, regardless of the maximum score of the alignment.

KSW2 implements the Suzuki-Kasahara algorithm and is a component of
[minimap2][mm2]. If you use KSW2 in your work, please cite:

> * Suzuki, H. and Kasahara, M. (2018). Introducing difference recurrence relations for faster semi-global alignment of long sequences. *BMC Bioinformatics*, **19**:45.
> * Li, H (2018) Minimap2: pairwise alignment for nucleotide sequences. *Bioinformatics*, **34**:3094-3100.
## Usage

Each `ksw2_*.c` file implements a single function and is independent of each
other. Here are brief descriptions about what each file implements:

* [ksw2_gg.c](ksw2_gg.c): global alignment; Green's standard formulation
* [ksw2_gg2.c](ksw2_gg2.c): global alignment; Suzuki's diagonal formulation
* [ksw2_gg2_sse.c](ksw2_gg2_sse.c): global alignment with SSE intrinsics; Suzuki's
* [ksw2_extz.c](ksw2_extz.c): global and extension alignment; Green's formulation
* [ksw2_extz2_sse.c](ksw2_extz2_sse.c): global and extension with SSE intrinsics; Suzuki's
* [ksw2_extd.c](ksw2_extd.c): global and extension alignment, dual gap cost; Green's formulation
* [ksw2_extd2_sse.c](ksw2_extd2_sse.c): global and extension, dual gap cost, with SSE intrinsics; Suzuki's

Users are encouraged to copy the header file `ksw2.h` and relevant
`ksw2_*.c` file to their own source code trees. On x86 CPUs with SSE2
intrinsics, `ksw2_extz2_sse.c` is recommended in general. It supports global
alignment, alignment extension with Z-drop, score-only alignment, global-only
alignment and right-aligned CIGARs. `ksw2_gg*.c` are mostly for demonstration
and comparison purposes. They are annotated with more comments and easier to
understand than `ksw2_ext*.c`. Header file [ksw2.h](ksw2.h) contains brief
documentations. TeX file [ksw2.tex](tex/ksw2.tex) gives brief derivation.

To compile the test program `ksw-test`, just type `make`. It takes the
advantage of SSE4.1 when available. To compile with SSE2 only, use `make
sse2=1` instead. If you have installed [parasail][para], use `make
parasail=prefix`, where `prefix` points to the parasail install directory (e.g.
`/usr/local`).

The following shows a complete example about how to use the library.
```c
#include <string.h>
#include <stdio.h>
#include "ksw2.h"

void align(const char *tseq, const char *qseq, int sc_mch, int sc_mis, int gapo, int gape)
{
int i, a = sc_mch, b = sc_mis < 0? sc_mis : -sc_mis; // a>0 and b<0
int8_t mat[25] = { a,b,b,b,0, b,a,b,b,0, b,b,a,b,0, b,b,b,a,0, 0,0,0,0,0 };
int tl = strlen(tseq), ql = strlen(qseq);
uint8_t *ts, *qs, c[256];
ksw_extz_t ez;

memset(&ez, 0, sizeof(ksw_extz_t));
memset(c, 4, 256);
c['A'] = c['a'] = 0; c['C'] = c['c'] = 1;
c['G'] = c['g'] = 2; c['T'] = c['t'] = 3; // build the encoding table
ts = (uint8_t*)malloc(tl);
qs = (uint8_t*)malloc(ql);
for (i = 0; i < tl; ++i) ts[i] = c[(uint8_t)tseq[i]]; // encode to 0/1/2/3
for (i = 0; i < ql; ++i) qs[i] = c[(uint8_t)qseq[i]];
ksw_extz(0, ql, qs, tl, ts, 5, mat, gapo, gape, -1, -1, 0, &ez);
for (i = 0; i < ez.n_cigar; ++i) // print CIGAR
printf("%d%c", ez.cigar[i]>>4, "MID"[ez.cigar[i]&0xf]);
putchar('\n');
free(ez.cigar); free(ts); free(qs);
}

int main(int argc, char *argv[])
{
align("ATAGCTAGCTAGCAT", "AGCTAcCGCAT", 1, -2, 2, 1);
return 0;
}
```
## Performance Analysis
The following table shows timing on two pairs of long sequences (both in the
"test" directory).
|Data set|Command line options |Time (s)|CIGAR|Ext|SIMD|Source |
|:-------|:--------------------------------|:-------|:---:|:-:|:--:|:-------|
|50k |-t gg -s |7.3 |N |N |N |ksw2 |
| |-t gg2 -s |19.8 |N |N |N |ksw2 |
| |-t extz -s |9.2 |N |Y |N |ksw2 |
| |-t ps\_nw |9.8 |N |N |N |parasail|
| |-t ps\_nw\_striped\_sse2\_128\_32|2.9 |N |N |SSE2|parasail|
| |-t ps\_nw\_striped\_32 |2.2 |N |N |SSE4|parasail|
| |-t ps\_nw\_diag\_32 |3.0 |N |N |SSE4|parasail|
| |-t ps\_nw\_scan\_32 |3.0 |N |N |SSE4|parasail|
| |-t extz2\_sse -sg |0.96 |N |N |SSE2|ksw2 |
| |-t extz2\_sse -sg |0.84 |N |N |SSE4|ksw2 |
| |-t extz2\_sse -s |3.0 |N |Y |SSE2|ksw2 |
| |-t extz2\_sse -s |2.7 |N |Y |SSE4|ksw2 |
|16.5k |-t gg -s |0.84 |N |N |N |ksw2 |
| |-t gg |1.6 |Y |N |N |ksw2 |
| |-t gg2 |3.3 |Y |N |N |ksw2 |
| |-t extz |2.0 |Y |Y |N |ksw2 |
| |-t extz2\_sse |0.40 |Y |Y |SSE4|ksw2 |
| |-t extz2\_sse -g |0.18 |Y |N |SSE4|ksw2 |
The standard DP formulation is about twice as fast as Suzuki's diagonal
formulation (`-tgg` vs `-tgg2`), but SSE-based diagonal formulation
is several times faster than the standard DP. If we only want to compute one
global alignment score, we can use 16-way parallelization in the entire inner
loop. For extension alignment, though, we need to keep an array of 32-bit
scores and have to use 4-way parallelization for part of the inner loop. This
significantly reduces performance (`-sg` vs `-s`). KSW2 is faster than
parasail partly because the former uses one score for all matches and another
score for all mismatches. For diagonal formulations, vectorization is more
complex given a generic scoring matrix.
It is possible to further accelerate global alignment with dynamic banding as
is implemented in [edlib][edlib]. However, it is not as effective for extension
alignment. Another idea is [adaptive banding][adap-band], which might be worth
trying at some point.
## Alternative Libraries
|Library |CIGAR|Intra-seq|Affine-gap|Local |Global |Glocal |Extension|
|:---------------|:---:|:-------:|:--------:|:-------:|:-------:|:-------:|:-------:|
|[edlib][edlib] |Yes |Yes |No |Very fast|Very fast|Very fast|N/A |
|[KSW][klib] |Yes |Yes |Yes |Fast |Slow |N/A |Slow |
|KSW2 |Yes |Yes |Yes/dual |N/A |Fast |N/A |Fast |
|[libgaba][gaba] |Yes |Yes |Yes |N/A? |N/A? |N/A? |Fast |
|[libssa][ssa] |No |No? |Yes |Fast |Fast |N/A |N/A |
|[Opal][opal] |No |No |Yes |Fast |Fast |Fast |N/A |
|[Parasail][para]|No |Yes |Yes |Fast |Fast |Fast |N/A |
|[SeqAn][seqan] |Yes |Yes |Yes |Slow |Slow |Slow |N/A |
|[SSW][ssw] |Yes |Yes |Yes |Fast |N/A |N/A |N/A |
|[SWIPE][swipe] |Yes |No |Yes |Fast |N/A? |N/A? |N/A |
|[SWPS3][swps3] |No |Yes |Yes |Fast |N/A? |N/A |N/A |
[hs]: https://github.com/ocxtal
[hs-eq]: https://github.com/ocxtal/diffbench
[edlib]: https://github.com/Martinsos/edlib
[klib]: https://github.com/attractivechaos/klib
[para]: https://github.com/jeffdaily/parasail
[opal]: https://github.com/Martinsos/opal
[ssw]: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library
[ssa]: https://github.com/RonnySoak/libssa
[gaba]: https://github.com/ocxtal/libgaba
[adap-band]: https://github.com/ocxtal/adaptivebandbench
[swipe]: https://github.com/torognes/swipe
[swps3]: http://lab.dessimoz.org/swps3/
[seqan]: http://seqan.de
[piece-affine]: https://www.ncbi.nlm.nih.gov/pubmed/2165832
[mm2]: https://github.com/lh3/minimap2
94 changes: 2 additions & 92 deletions lib/microtar/README.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,8 @@
# microtar
A lightweight tar library written in ANSI C


## Basic Usage
The library consists of `microtar.c` and `microtar.h`. These two files can be
dropped into an existing project and compiled along with it.


#### Reading
```c
mtar_t tar;
mtar_header_t h;
char *p;

/* Open archive for reading */
mtar_open(&tar, "test.tar", "r");

/* Print all file names and sizes */
while ( (mtar_read_header(&tar, &h)) != MTAR_ENULLRECORD ) {
printf("%s (%d bytes)\n", h.name, h.size);
mtar_next(&tar);
}

/* Load and print contents of file "test.txt" */
mtar_find(&tar, "test.txt", &h);
p = calloc(1, h.size + 1);
mtar_read_data(&tar, p, h.size);
printf("%s", p);
free(p);

/* Close archive */
mtar_close(&tar);
```
#### Writing
```c
mtar_t tar;
const char *str1 = "Hello world";
const char *str2 = "Goodbye world";
/* Open archive for writing */
mtar_open(&tar, "test.tar", "w");
/* Write strings to files `test1.txt` and `test2.txt` */
mtar_write_file_header(&tar, "test1.txt", strlen(str1));
mtar_write_data(&tar, str1, strlen(str1));
mtar_write_file_header(&tar, "test2.txt", strlen(str2));
mtar_write_data(&tar, str2, strlen(str2));
/* Finalize -- this needs to be the last thing done before closing */
mtar_finalize(&tar);
/* Close archive */
mtar_close(&tar);
```


## Error handling
All functions which return an `int` will return `MTAR_ESUCCESS` if the operation
is successful. If an error occurs an error value less-than-zero will be
returned; this value can be passed to the function `mtar_strerror()` to get its
corresponding error string.


## Wrapping a stream
If you want to read or write from something other than a file, the `mtar_t`
struct can be manually initialized with your own callback functions and a
`stream` pointer.

All callback functions are passed a pointer to the `mtar_t` struct as their
first argument. They should return `MTAR_ESUCCESS` if the operation succeeds
without an error, or an integer below zero if an error occurs.

After the `stream` field has been set, all required callbacks have been set and
all unused fields have been zeroset the `mtar_t` struct can be safely used with
the microtar functions. `mtar_open` *should not* be called if the `mtar_t`
struct was initialized manually.

#### Reading
The following callbacks should be set for reading an archive from a stream:

Name | Arguments | Description
--------|------------------------------------------|---------------------------
`read` | `mtar_t *tar, void *data, unsigned size` | Read data from the stream
`seek` | `mtar_t *tar, unsigned pos` | Set the position indicator
`close` | `mtar_t *tar` | Close the stream

#### Writing
The following callbacks should be set for writing an archive to a stream:

Name | Arguments | Description
--------|------------------------------------------------|---------------------
`write` | `mtar_t *tar, const void *data, unsigned size` | Write data to the stream

This library was adapted from the original microtar (https://github.com/rxi/microtar)
to be read-only and support fast seeking.

## License
This library is free software; you can redistribute it and/or modify it under
Expand Down

0 comments on commit 46c26ce

Please sign in to comment.