Skip to content

wlevine/daru

 
 

Repository files navigation

daru

Data Analysis in RUby

Gem Version Build Status

Introduction

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.

daru is inspired by pandas, a very mature solution in Python.

Written in pure Ruby so should work with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2.

Features

  • Data structures:
    • Vector - A basic 1-D vector.
    • DataFrame - A 2-D spreadsheet-like structure for manipulating and storing data sets. This is daru's primary data structure.
  • Compatible with IRuby notebook, statsample and statsample-glm.
  • Singly and hierarchially indexed data structures.
  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Plentiful iterators.
  • Optional speed and space optimization on MRI with NMatrix and GSL.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary.
  • Import and exports dataset from and to Excel, CSV, Databases and plain text files.

Notebooks

Usage

Case Studies

Blog Posts

Documentation

Docs can be found here.

Roadmap

  • Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
  • Basic Data manipulation and analysis operations:
    • DF concat
  • Assignment of a column to a single number should set the entire column to that number.
  • == between daru_vector and string/number.
  • Multiple column assignment with []=
  • Multiple value assignment for vectors with []=.
  • #find_max function which will evaluate a block and return the row for the value of the block is max.
  • Function to check if a value of a row/vector is within a specified range.
  • Create a new vector in map_rows if any of the already present rows dont match the one assigned in the block.
  • Sort by index.
  • Statistics on DataFrame over rows and columns.
  • Cumulative sum.
  • Calculate percentage change.
  • Have some sample data sets for users to play around with. Should be able to load these from the code itself.
  • Sorting with missing data present.
  • Change internals of indexes to raise errors when a particular index is missing and the passed key is a Fixnum. Right now we just return the Fixnum for convienience.

Contributing

Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!

Acknowledgements

  • Google and the Ruby Science Foundation for the Google Summer of Code 2015 grant for further developing daru and integrating it with other ruby gems.
  • Thank you last.fm for making user data accessible to the public.

Copyright (c) 2015, Sameer Deshmukh All rights reserved

About

Data Analysis in RUby

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 99.9%
  • Shell 0.1%