-
Notifications
You must be signed in to change notification settings - Fork 0
Ruby Grant 2019 proposal
Name: Udit Gulati
E-mail: [email protected]
GitHub: uditgulati
Blog: uditgulati.github.io/blog
Phone: +91 9896363996
I am an active SciRuby contributor. I worked on NumRuby(re-implementation of NMatrix) during summer of 2019 as part of Google Summer of Code program. I am currently a final year computer science undergraduate at Indian Institute of Information Technology, Una.
Projects
- NumRuby
- Ruby-Sparse
Talks
- Selected as a speaker at RubyConf 2019 to be held during November 18-20 in Nashville, TN, USA to talk about "Speeding up NMatrix by 100x".
Sparse storage support for Ruby environment.
Sparse matrices are a very important part of scientific computing. Many popular modern languages like Python, Julia and R (some even have this as part of their standard libraries) have sparse matrices support while Ruby doesn't have a good library for sparse.
This project is to build an efficient and feature-rich end-user intended Sparse matrix library in Ruby with interfaces to popular dense matrix libraries (like Numo-Narray, NumRuby[NMatrix]) with linear algebra and graph functionalities support.
Sparse arrays, or arrays that are mostly empty or filled with zeros, are common in many scientific applications. To save space we often avoid storing these arrays in traditional dense formats, and instead choose different data structures. The choice of data structure can significantly affect our storage and computational costs when working with these arrays.
Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations. When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus requires significantly less storage.
Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms. So, it is required to implement completely different algorithms for sparse matrices leading to implementation of a completely different library for sparse.
Sparse storage can be implemented by following ways:
- Compressed sparse row (CSR)
- Compressed sparse column (CSC)
- Coordinate list (COO)
- Diagonal
These types are explained in Technical details section with implementation details.
- Better performant sparse matrix operations. Improvements in speed and memory usage.
- Graph algorithms support.
- Sparse matrix linear algebra is better in performance and requires lesser memory.
While many popular languages have sparse storage support (either in standard library or as a third-party library), Ruby doesn't have a good support for it. This projects aims to fill this gap by implementing a sparse storage library for Ruby environment which is fast, easy to use, API rich and highly compatible with popular dense storage libraries such as Numo-Narray and NumRuby (Re-implementation of NMatrix).
1. Python
- Python has 2 major libraries for sparse matrix support, namely,
scipy.sparse
andpydata.sparse
. -
scipy.sparse
is for 2-dimensional matrices only. It supports quite a lot of types of sparse matrix. These are BSR, COO, CSC, CSR, DIA, DOK and LIL. -
scipy.sparse
has a sub-modulescipy.sparse.csgraph
for graph algorithms. It consists of algorithms such as Dijkstra shortest path, Floyd warshall, Bellman ford, Bfs & Dfs, Minimum spanning tree, Maximum bipartite matching etc. -
scipy.sparse
has another sub-modulescipy.sparse.linalg
which is for linear algebra capabilities for sparse matrices. It consists of capabilities such as finding inverse, determinant or norm of a matrix, matrix factorization and solving linear problems. -
pydata.sparse
is for arbitrary dimensional sparse matrices. It supports only 2 types of sparse types, namely, COO and DOK. -
pydata.sparse
also supports basic arithmetic operations on sparse matrix and also has useful utility functions,random
,eye
andones
to name a few. -
pydata.sparse
is compatible withnumpy.ndarray
whilescipy.sparse
is compatible withnumpy.matrix
.
2. Julia
- Julia has support for sparse vectors and sparse matrices in the
SparseArrays
std-lib module. - In Julia, sparse matrices are stored in the Compressed Sparse Column (CSC) format.
- There is support for sparse matrix arithmetic operations and indexing.
There are a few gems for Sparse but none of them is complete enough to be used for practical work and they are also currently not being maintained by anyone. Some of them lack the performance (such as being completely in Ruby), some are only for one type of sparse (like CSR), some lack primary functionalities such as conversion to/from dense storage or sparse 2d-matrix operations. There is also no proper documentation for these gems. The reason being that most of them were written as a pet project or an experiment and not as a full-fledged library intended for end-user.
Below are some gems that I believe have some resemblance to sparse storage implementations:
- sparse_matrix: Supports only creation and element-wise operations on a key-value pair type Sparse matrix. Written purely in Ruby.
- csrmatrix: An implementation of CSR sparse matrix written purely in Ruby. Provides arithmetic operations and matrix operations such as taking transpose or finding determinant etc.
- sparsematrix: An implementation of Yale sparse matrix written purely in Ruby. Only supports creation and elements access of sparse matrix.
- libsmatrix: Written in C with APIs provided for C, Ruby and Java/Scala. Supports only key-value pair type Sparse matrix.
https://github.com/SciRuby/ruby-sparse
A proposed project layout is as follows:
ruby-sparse
└── ext
├── blas
│ └── sparse_blas.c
├── elementwise
│ ├── arithmetic.c
│ ├── exponential.c
│ ├── hyperbolic.c
│ ├── numeric.c
│ └── trigonometric.c
├── indexing
│ ├── index.c
│ ├── iterators.c
│ ├── slicing.c
│ └── broadcasting.c
├── extconf.rb
├── interfaces
│ ├── narray.c
│ └── nmatrix.c
├── io
│ └── serialize.c
├── graph
│ ├── narray.c
│ └── nmatrix.c
├── mkmf.rb
├── ruby_sparse.h
└── ruby_sparse.c
RubySparse
RubySparse::CSR
RubySparse::CSC
RubySparse::COO
RubySparse::Dia
RubySparse::Graph
RubySparse::Linalg
RubySparse::SparseMatrix
Classes
1. RubySparse::CSR
Class for Compressed Sparse Row matrix.
2. RubySparse::CSC
Class for Compressed Sparse Column matrix.
3. RubySparse::COO
Class for COOrdinate format sparse matrix.
4. RubySparse::Dia
Class for DIAgonal sparse matrix.
5. RubySparse::SparseMatrix
This class provides a base class for all sparse matrices. It cannot be instantiated. Most of the work is provided by it's subclasses.
Functions
1. eye(m, n, dtype: 'float', format: 'DIA')
Sparse matrix with ones on diagonal. Returns a sparse (m x n) matrix where the diagonal is all ones and everything else is zeros.
2. kron(A, B, format: 'CSR')
Kronecker product of sparse matrices A and B. Returns Kronecker product in a sparse matrix format.
3. tril(A, format: 'CSR')
Return the lower triangular portion of a matrix in sparse format.
4. triu(A, format: 'CSR')
Return the upper triangular portion of a matrix in sparse format
5. hstack(blocks, format: 'CSR', dtype: 'float')
Stack sparse matrices horizontally (column wise).
6. vstack(blocks, format: 'CSR', dtype: 'float')
Stack sparse matrices vertically (row wise).
7. dot(A, B)
Perform the dot product on two sparse matrices A and B.
8. save(file, matrix)
Save a sparse matrix to a file.
9. load(file)
Load a sparse matrix from a file. Returns a sparse matrix containing the loaded data.
Submodules
1. RubySparse::Graph
This module consists of graph algorithms for sparse matrices.
- dijkstra(graph, directed: True, weighted: True): Dijkstra algorithm using Fibonacci Heaps.
- floyd_warshall(grpah, directed: True, weighted: True): Compute the shortest path lengths using the Floyd-Warshall algorithm.
- bellman_ford(graph, directed: True, weighted: True): Compute the shortest path lengths using the Bellman-Ford algorithm.
- breadth_first_order(graph, start_node, directed: True): Return a breadth-first ordering starting with specified node.
- depth_first_order(graph, start_node, directed: True): Return a depth-first ordering starting with specified node.
- breadth_first_tree(graph, start_node, directed: True): Return the tree generated by a breadth-first search.
- depth_first_tree(graph, start_node, directed: True): Return the tree generated by a depth-first search.
- minimum_spanning_tree(graph): Return a minimum spanning tree of an undirected graph.
- maximum_bipartite_matching(graph): Returns an array of row or column permutations that makes the diagonal of a non-singular square sparse matrix zero free.
2. RubySparse::Linalg
This module consists of linear algebra capabilities for sparse matrices.
- inv(A): Compute the inverse of a sparse matrix. Returns the inverse of A.
- expm(A): Compute the matrix exponential using Pade approximation. Returns the matrix exponential of A.
-
norm(A, order, axis):
Norm of a sparse matrix. This function is able to return one of seven different matrix norms, depending on the value of the
order
parameter. Returns norm of matrix. - solve(A, B): Solve the sparse linear system Ax=b, where b may be a vector or a matrix. Returns x, the solution of linear equation.
- eigs(A, k): Find k eigenvalues and eigenvectors of the square matrix A. Returns an arrays of k eigenvalues and an array of k eigenvectors.
- eigsh(A, k): Find k eigenvalues and eigenvectors of the real symmetric square matrix or complex hermitian matrix A. Returns an arrays of k eigenvalues and an array of k eigenvectors.
- svds(A, k): Compute the largest k singular values/vectors for a sparse matrix.
- lu(A): Compute the LU decomposition of a sparse, square matrix.
CSR
The compressed sparse row (CSR) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. Methods to be implemented:
- get_dense_from_csr: This method will convert CSR storage to dense storage.
- get_csr_from_dense: This method will convert dense storage to CSR storage.
typedef struct CSR_STRUCT
{
sp_dtype dtype;
size_t ndims;
size_t count; //count of non-zero elements;
size_t* shape;
double* elements; //elements array
size_t* ia; //row index
size_t* ja; //col index
}csr_matrix;
CSC
CSC is similar to CSR except that values are read first by column, a row index is stored for each value, and column pointers are stored. Methods to be implemented:
- get_dense_from_csc: This method will convert CSC storage to dense storage.
- get_csc_from_dense: This method will convert dense storage to CSC storage.
typedef struct CSC_STRUCT
{
sp_dtype dtype;
size_t ndims;
size_t count; //count of non-zero elements;
size_t* shape;
double* elements; //elements array
size_t* ia; //row index
size_t* ja; //col index
}csc_matrix;
COO
COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted first by row index and then by column index, to improve random access times. Methods to be implemented:
- get_dense_from_coo: This method will convert COO storage to dense storage.
- get_coo_from_dense: This method will convert dense storage to COO storage.
typedef struct COO_STRUCT
{
sp_dtype dtype;
size_t ndims;
size_t count; //count of non-zero elements;
size_t* shape;
double* elements; //elements array
size_t* ia; //row index
size_t* ja; //col index
}coo_matrix;
Diagonal
Diagonal stores just the entries in the main diagonal as a one-dimensional array, so a diagonal n × n matrix requires only n entries. It is used when only the diagonal entries are non-zero. Methods to be implemented:
- get_dense_from_dia: This method will convert Dia storage to dense storage.
- get_dia_from_dense: This method will convert dense storage to Dia storage.
typedef struct DIA_STRUCT
{
sp_dtype dtype;
size_t ndims;
size_t count; //count of non-zero elements;
size_t* shape;
double* elements; //elements array
size_t* offset;
}dia_matrix;
Initialization code for RubySparse and it's modules and classes is as following:
VALUE RubySparse = Qnil;
VALUE SparseMatrix = Qnil;
VALUE COO = Qnil;
void Init_sparse() {
RubySparse = rb_define_module("RubySparse");
SparseMatrix = rb_define_class_under(RubySparse, "SparseMatrix", rb_cObject);
COO = rb_define_class_under(RubySparse, "COO", rb_cObject);
rb_define_alloc_func(COO, coo_alloc);
rb_define_method(COO, "initialize", coo_init, -1);
rb_define_method(COO, "shape", coo_get_shape, 0);
rb_define_method(COO, "elements", coo_get_elements, 0);
rb_define_method(COO, "coords", coo_get_coords, 0);
rb_define_method(COO, "nzcount", coo_get_count, 0);
rb_define_method(COO, "dim", coo_get_ndims, 0);
rb_define_method(COO, "+", coo_add, 1);
}
The sparse matrix initialization of COO is as following:
VALUE coo_init(int argc, VALUE* argv, VALUE self) {
coo_matrix* mat;
Data_Get_Struct(self, coo_matrix, mat);
if(argc > 0){
mat->ndims = 2;
mat->count = (size_t)RARRAY_LEN(argv[1]);
mat->shape = ALLOC_N(size_t, mat->ndims);
for(size_t index = 0; index < mat->ndims; index++) {
mat->shape[index] = (size_t)FIX2LONG(RARRAY_AREF(argv[0], index));
}
mat->elements = ALLOC_N(double, mat->count);
for(size_t index = 0; index < mat->count; index++) {
mat->elements[index] = (double)NUM2DBL(RARRAY_AREF(argv[1], index));
}
mat->ia = ALLOC_N(size_t, mat->count);
for(size_t index = 0; index < mat->count; index++) {
mat->ia[index] = (size_t)NUM2SIZET(RARRAY_AREF(argv[2], index));
}
mat->ja = ALLOC_N(size_t, mat->count);
for(size_t index = 0; index < mat->count; index++) {
mat->ja[index] = (size_t)NUM2SIZET(RARRAY_AREF(argv[3], index));
}
}
return self;
}
VALUE coo_alloc(VALUE klass) {
coo_matrix* mat = ALLOC(coo_matrix);
return Data_Wrap_Struct(klass, NULL, coo_free, mat);
}
void coo_free(coo_matrix* mat) {
xfree(mat);
}
Similarly, it'll be implemented for other sparse types.
The library will be implemented only for int
and float64
dtypes during the grant period. The other dtypes such as bool
, complex64
and string
will be implemented if time allows or will be done after the grant period.
Sparse matrix indexing works differently as compared to that of dense ones. Each sparse type has a different way of indexing of elements as the position of each element is stored differently in each one. Foe example, for COO, the row and column indices are directly stored with the element, so it can directly be accessed from these but fro CSR which stores the column index but stores how many elements are in each row which doesn't give direct access to the row index of an element.
Once indexing is implemented then it can be used to iterate over each element of the sparse matrix. Other types of iterator can be implemented such as:
-
each_with_index
: Iterate over each element with index of occurrence. -
each_with_indices
: Iterate over each element with row and column index. -
each_row
: Iterate over each row of sparse matrix. -
each_column
: Iterate over each column of sparse matrix. -
each_row_with_index
: Iterate over each row of sparse matrix with corresponding index. -
each_column_with_index
: Iterate over each column of sparse matrix with corresponding index.
Indexing will also be used to implement slicing of sparse matrices. Examples of slicing:
-
mat[1, 0:2]
: Row with index 1 and columns with indices from 0 to 2, not including 2. -
mat[0:5, 3:4]
Rows with indices from 0 to 5, not including 5 and columns with indices from 3 to 4, not including 4.
Elementwise
The following Elementwise operations would be supported.
- Basic Arithmetic operations:
+
,-
,*
,/
,>>
,<<
. - Exponential and logarithmic functions:
exp
,log
,expm1
,log1p
, etc. - Hyperbolic functions:
sinh
,cosh
,tanh
, etc. - Logical operations:
&&
,||
,|
,&
,<
,>
,<=
,>=
,==
,!
. - Numeric functions:
floor
,round
,min
,max
, etc. - Trigonometric functions:
sin
,cos
,tan
, etc.
Elementwise addition for COO is as follows:
VALUE coo_add(VALUE self, VALUE another) {
coo_matrix* left;
coo_matrix* right;
Data_Get_Struct(self, coo_matrix, left);
Data_Get_Struct(another, coo_matrix, right);
coo_matrix* result = ALLOC(coo_matrix);
result->count = 0;
result->ndims = left->ndims;
result->shape = ALLOC_N(size_t, result->ndims);
for(size_t index = 0; index < result->ndims; index++){
result->shape[index] = left->shape[index];
}
result->elements = ALLOC_N(double, 2 * left->count);
result->ia = ALLOC_N(size_t, 2 * left->count);
result->ja = ALLOC_N(size_t, 2 * left->count);
size_t left_index = 0, right_index = 0, result_index = 0;
while(left_index < left->count || right_index < right->count) {
if(left->ia[left_index] == right->ia[right_index] //left and right indices equal
&& left->ja[left_index] == right->ja[right_index]
&& left_index < left->count && right_index < right->count) {
result->elements[result_index] = left->elements[left_index] + right->elements[right_index];
result->ia[result_index] = left->ia[left_index];
result->ja[result_index] = left->ja[left_index];
left_index++, right_index++, result_index++;
}
else if((left->ia[left_index] < right->ia[right_index]) //left index smaller
|| (left->ia[left_index] == right->ia[right_index] && left->ja[left_index] < right->ja[right_index])
&& left_index < left->count) {
result->elements[result_index] = left->elements[left_index];
result->ia[result_index] = left->ia[left_index];
result->ja[result_index] = left->ja[left_index];
left_index++, result_index++;
}
else { //right index smaller
result->elements[result_index] = right->elements[right_index];
result->ia[result_index] = right->ia[right_index];
result->ja[result_index] = right->ja[right_index];
right_index++, result_index++;
}
result->count++;
}
return Data_Wrap_Struct(COO, NULL, coo_free, result);
}
Matrix multiplication
There are multiple sparse BLAS libraries to support:
- matrix-vector multiplication
- Matrix-matrix multiplication
I'll use the NIST Sparse BLAS library written in FORTRAN for these operation. The details for library can be found at https://math.nist.gov/spblas/.
The sparse linear algebra sub-module needs further research on which back-end to use as there isn't a LAPACK counterpart for sparse matrices. I have found a few possible alternatives, such as:
- CBLAS - https://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.html
- Eigen - http://eigen.tuxfamily.org/index.php?title=Main_Page
There's also a list of low-level libraries for sparse matrices here at http://www.netlib.org/utk/people/JackDongarra/la-sw.html.
For the grant period, I'll focus more on graph sub-module to get completed and will also research on possible solutions for sparse linear algebra sub-module and will implement basic linear algebra functionalities such as finding matrix inverse, norm and determinant or finding solution to system of linear equations.
A graph which is defined as a pair G = (V, E), where V is a set whose elements are called vertices (singular: vertex), and E is a set of two-sets (set with two distinct elements) of vertices, whose elements are called edges (sometimes links or lines).
Graphs can be stored in possible ways:
- A dense matrix. Uses
O(V^2)
space. Usually used when the graph is near to being a complete graph. In other words,E
approachesO(V^2)
. - A sparse matrix. Uses
O(V)
space. Used whenE
is of order ofO(V)
like a tree.
The second possibility is quite frequent and can be implemented using sparse storage.
Following are of the popular graph algorithms which will be implemented as part of this module:
Note: Below implementations are in C++ making use of STL standard library containers for data storage. When the module will be implemented, the following codes would be ported to C with implementation for the required data structures also.
Dijkstra's shortest path
Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks.It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.
/*
Dijkstra's Algorithm
Time Complexity : O(ElogN)
*/
const int siz = 1e6 + 2;
const int inf = 1e9;
vector < pair<int, int> > graph[siz];
void dijkstra(int source,int n) {
set < pair<int, int> > s; // set to process edges
vector <int> dist(n+1, inf); // store distance from source
s.insert(make_pair(0, source));
dist[source] = 0;
while (!s.empty()) {
pair <int, int> tmp = *(s.begin());
s.erase(s.begin());
int u = tmp.second;
for (vector < pair<int,int> >::iterator i = graph[u].begin(); i != graph[u].end(); i++) {
int v = (*i).first;
int weight = (*i).second;
if (dist[v] > dist[u] + weight) {
if (dist[v] != inf)
s.erase(s.find(make_pair(dist[v], v)));
dist[v] = dist[u] + weight;
s.insert(make_pair(dist[v], v));
}
}
}
// final shortest distance from every other node
for(int i = 1; i <= n; i++) {
printf("Dist From %d To %d : %d\n", source, i, dist[i]);
}
}
Breadth first search
Breadth-first search is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph), and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level.
/*
Breadth first ssearch
Time complexity: O(N)
*/
const int num = 1e5 + 2;
vector <int> adj[num];
bool visited[num];
bool bfs(int source) {
queue <int> que;
visited[source] = true;
que.push(source);
while(!que.empty()) {
int temp = que.front();
que.pop();
visited[temp] = true;
int len = conn[temp].size();
for(int i = 0; i < len; i++) {
int v = conn[temp][i];
if(!visited[v]) {
que.push(v);
}
}
}
}
Depth first search
Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.
/*
Depth first search
Time complexity: O(N)
*/
const int num = 1e5 + 2;
vector <int> adj[num];
bool visited[num];
void dfs(int s) {
visited[s] = true; //Now this node is visited
for(int i = 0; i < adj[s].size(); i++){ //If not leaf node;
if(visited[adj[s][i]] == false) //If next node is not visited
dfs(adj[s][i]);
}
}
Minimum spanning tree
MST using Prim's algorithm. This sample implementation is for finding the total weight of minimum spanning tree.
/*
Prim's algorithm
Time Complexity : O(ElogN)
*/
typedef long long ll;
typedef pair<int, int> pii;
typedef pair<int, pii> piii;
const int siz = 1e6 + 10;
const ll modu = 1e9 + 7;
vector < pair<int, int> > conn[siz], conn1[siz];
//conn is original graph
//conn1 is the MST
int marked[siz];
int prim(int source) {
priority_queue<piii, vector<piii>, greater<piii> > Q;
piii curr;
int cost = 0;
Q.push(make_pair(0, make_pair(source, source)));
while(!Q.empty()) {
curr = Q.top();
Q.pop();
int next = curr.second.first;
int prev = curr.second.second;
int weight = curr.first;
if(marked[next])
continue;
cost += weight;
if(next != prev) {
conn1[next].push_back(make_pair(weight, prev));
conn1[prev].push_back(make_pair(weight, next));
}
marked[next] = true;
int len = conn[next].size();
for(int i = 0; i < len; i++) {
int temp = conn[next][i].second;
int w1 = conn[next][i].first;
if(!marked[temp])
Q.push(make_pair(w1, make_pair(temp, next)));
}
}
return cost;
}
Maximum bipartite matching
A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges share an endpoint. A maximum matching is a matching of maximum size (maximum number of edges). In a maximum matching, if any edge is added to it, it is no longer a matching. There can be more than one maximum matchings for a given Bipartite Graph.
/*
Maximum bipartite matching
Time complexity: O(E * V)
*/
typedef vector<int> VI;
typedef vector<VI> VVI;
bool FindMatch(int i, const VVI &w, VI &mr, VI &mc, VI &seen) {
for(int j = 0; j < w[i].size(); j++) {
if(w[i][j] && !seen[j]) {
seen[j] = true;
if(mc[j] < 0 || FindMatch(mc[j], w, mr, mc, seen)) {
mr[i] = j;
mc[j] = i;
return true;
}
}
}
return false;
}
int BipartiteMatching(const VVI &w, VI &mr, VI &mc) {
mr = VI(w.size(), -1);
mc = VI(w[0].size(), -1);
int ct = 0;
for(int i = 0; i < w.size(); i++) {
VI seen(w[0].size());
if(FindMatch(i, w, mr, mc, seen))
ct++;
}
return ct;
}
Same way Floyd warshall and Bellman Ford will be implemented.
NumRuby (Re-implementation of NMatrix)
Each of the sparse classes will have a class method called from_nmatrix()
and an instance method called to_nmatrix()
. from_nmatrix()
is to convert nmatrix to given sparse type and to_nmatrix()
is to convert given sparse type to nmatrix.
Following is the implementation of these methods for RubySparse::COO
:
- NMatrix to COO
VALUE coo_from_nmatrix(VALUE self, VALUE nmat) {
VALUE r_elements = rb_funcall(nmat, rb_intern("elements"), 0);
VALUE r_shape = rb_funcall(nmat, rb_intern("shape"), 0);
if((size_t)RARRAY_LEN(r_shape) != 2) {
rb_raise(rb_eStandardError, "NMatrix must be of 2 dimensions.");
}
coo_matrix* mat;
mat->ndims = 2;
mat->count = 0;
mat->shape = ALLOC_N(size_t, mat->ndims);
for(size_t index = 0; index < mat->ndims; index++) {
mat->shape[index] = (size_t)NUM2SIZET(RARRAY_AREF(r_shape, index));
}
size_t dense_elements_count = (mat->shape[0]) * (mat->shape[1]);
size_t sparse_elements_count = 0;
for(size_t index = 0; index < dense_elements_count; ++index) {
double value = (double)NUM2DBL(RARRAY_AREF(r_elements, index));
if(fabs(value - 0.0) < 1e-6)
continue;
sparse_elements_count++;
}
mat->count = sparse_elements_count;
mat->elements = ALLOC_N(double, mat->count);
mat->ia = ALLOC_N(size_t, mat->count);
mat->ja = ALLOC_N(size_t, mat->count);
size_t element_index = 0, index = 0;
for(size_t i = 0; i < mat->shape[0]; ++i) {
for(size_t j = 0; j < mat->shape[1]; ++j) {
double value = (double)NUM2DBL(RARRAY_AREF(r_elements, element_index++));
if(fabs(value - 0.0) < 1e-6)
continue;
mat->elements[index] = value;
mat->ia[index] = i;
mat->ja[index] = j;
index++;
}
}
return Data_Wrap_Struct(COO, NULL, coo_free, mat);
}
- COO to NMatrix
class COO
def to_nmatrix
ia = self.coords[0]
ja = self.coords[1]
nm_elements = Array.new(self.shape[0]*self.shape[1], 0)
for i in (0...self.nzcount)
nm_index = (self.shape[1] * ia[i]) + ja[i]
nm_elements[nm_index] = self.elements[i]
end
m = NMatrix.new self.shape, nm_elements
return m
end
end
Numo-Narray
The interfaces for Numo-Narray will also be implemented similarly as NMatrix.
There will be support to serialize the Sparse storage object to store it on disk and later can be read from the disk. Object serialization in Ruby is also known as Object marshalling.
Serialization support will be implemented as load
and save
methods in io/serialize.c
file.
Tests
RSpec will be used for tests. Tests can be found here.
Tests structure will be as follows:
$LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
require 'sparse'
require 'minitest/autorun'
class COO::CreationTest < Minitest::Test
def setup
@n = RubySparse::COO.new [3, 3], [1, 2, 3], [0, 1, 2], [0, 1, 2]
end
def test_ndims
assert_equal 2, @n.dim
end
def test_shape
assert_equal [3, 3], @n.shape
end
def test_elements
assert_equal [1, 2, 3], @n.elements
end
def test_coords
assert_equal [[0, 1, 2], [0, 1, 2]], @n.coords
end
def test_count
assert_equal 3, @n.nzcount
end
end
class COO::ElementWiseTest < Minitest::Test
def setup
@left = RubySparse::COO.new [3, 3], [1, 2, 3], [0, 1, 2], [0, 1, 2]
@right = RubySparse::COO.new [3, 3], [3, 2, 3], [0, 1, 2], [2, 2, 2]
end
def test_add
result = RubySparse::COO.new [3, 3], [1, 3, 2, 2, 6], [0, 0, 1, 1, 2], [0, 2, 1, 2, 2]
answer = @left + @right
assert_equal answer.dim, result.dim
assert_equal answer.shape, result.shape
assert_equal answer.nzcount, result.nzcount
assert_equal answer.elements, result.elements
assert_equal answer.coords, result.coords
end
end
Yard
will be used for documentation.
The example files gives a gist of usage of this library. The syntax is Ruby-like.
28th Oct - 6th Nov
- Implement the
RubySparse::COO
class with element-wise operations. - Implement indexing for
RubySparse::COO
. - Implement
Numo::Narray
andNMatrix
interfaces forRubySparse::COO
.
7th Nov - 16th Nov
- Implement the
RubySparse::CSR
class with element-wise operations. - Implement indexing for
RubySparse::CSR
. - Implement
Numo::Narray
andNMatrix
interfaces forRubySparse::CSR
.
17th Nov - 26th Nov
- Implement the
RubySparse::CSC
class with element-wise operations. - Implement indexing for
RubySparse::CSC
. - Implement
Numo::Narray
andNMatrix
interfaces forRubySparse::CSC
.
27th Nov - 6th Dec
- Implement the
RubySparse::DIA
class with element-wise operations. - Implement indexing for
RubySparse::DIA
. - Implement
Numo::Narray
andNMatrix
interfaces forRubySparse::DIA
.
7th Dec - 16th Dec
- Implement pretty print for
RubySparse::COO
,RubySparse::CSR
,RubySparse::CSC
andRubySparse::DIA
. - Implement
resize()
andreshape()
.
17th Dec - 26th Dec
- Add serialization support for sparse matrix classes.
- Implement
load()
andsave()
methods. - Implement tests for sparse matrix element-wise operations.
27th Dec - 5th Jan
- Add inter sparse classes conversion support.
- Implement classes such as
to_coo()
,to_csr()
,to_csc()
andto_dia()
.
6th Jan - 13th Jan
- Add tests for
RubySparse::COO
,RubySparse::CSR
,RubySparse::CSC
andRubySparse::DIA
. - Refactor and optimize code for
RubySparse::COO
,RubySparse::CSR
,RubySparse::CSC
andRubySparse::DIA
.
Mid term deliverables (14th Jan)
- Implementation of sparse storage classes, namely, COO, CSR, CSC and DIA.
- Implement serialization support.
- Implement interfaces to dense matrix libraries and inter sparse type conversion support.
- Implement pretty print for all sparse types.
15th Jan - 24th Jan
- Implement
RubySparse::Graph
BFS and DFS algorithms. - Add
depth_first_order()
andbreadth_first_order()
functions. - Add
depth_first_tree()
andbreadth_first_tree()
functions. - Implement Dijkstra's shortest path algorithm.
25th Jan - 3rd Feb
- Implement Floyd Warshall algorithm.
- Implement Bellman Ford algorithm.
- Implement Minimum spanning tree algorithm.
- Implement Maximum bipartite matching algorithm.
4th Feb - 13th Feb
- Implement
RubySparse::Linalg
module. - Add matrix-vector, matrix-matrix dot product support.
- Add sparse matrix linear equation solution support.
14th Feb - 23rd Feb
- Implement
triu()
andtril()
methods for sparse types classes. - Implement
hstack()
andvstack()
methods.
24th Feb - 4th Mar
- Add tests for
RubySparse::Linalg
andRubySparse::Graph
. - Write documentation for ruby sparse API.
- Refactor and optimize code for
RubySparse::Linalg
andRubySparse::Graph
.
5th Mar - 12th Mar
- Release
ruby-sparse
gem. - Create benchmarks for the library.
- Create graph plots for the benchmarks and comparison will other libraries.
Final deliverables (13th Mar)
-
RubySparse::Linalg
andRubySparse::Graph
modules with proper tests. - Documentation for the code written so far.
- Proper benchmarks for core functionalities.
- Graph plots of benchmarks and comparison with respect to other similar libraries.
- Implementation of Sparse storage library.
- IO (serialization) support for the Sparse storage.
- Implementation of proper interfaces for dense storage (or dense arrays) libraries, namely, Numo-Narray and NumRuby.
- Implement a linear algebra and a graph module for sparse matrices.
- Proper documentation and strong tests.
- Release of
sparse-storage
gem.
- The library can be extended to support arbitrary dimensional sparse arrays.
- More complex linear algebra and graph algorithms can be added if required.
- Add broadcasting support to sparse matrices.
I have asked my GSoC 2019 mentor Prasun Anand to advise me on this project. Prasun works as a software engineer at Quansight and is currently working on the Pytorch library.