Skip to content

Google Summer of Code 2021 Ideas

Pjotr Prins edited this page Feb 25, 2021 · 29 revisions

This page contains ideas for Google Summer of Code (GSoC) 2021.

Ruby 3.0 was released in December 2020 and is an exciting major upgrade of a great programming language! Ruby is used in industry and science. Ruby is a great language for learning and boosting your productivity. Writing code in Ruby is fun. Joining the SciRuby Google Summer of Code will make you a better programmer.

Our projects aim to make Ruby a better environment for science. Even if it is true that python has become the most prominent language for science we think that Ruby is the superior language and makes students (and mentors!) better programmers. We encourage diversity and the SciRuby project consists of friendly people from many backgrounds and nationalities. SciRuby is a growth organization, students become mentors and mentors become org admins (including this year's Udit Gulati), and almost every year our GSoC students went on to win additional awards and funding for their projects. We have a code of conduct which can be found here.

With the new GSoC projects are limited to about 180 hours of work. We have take care to adjust tasks, but often it is hard to predict how long something will take. During GSoC we'll adjust the program accordingly. Mentors are reachable through the mailing list of the GNU Guix project. Also, feel free to contact us individually.

Table of Contents

Project Ideas

  1. Pangenomes for Ruby
  2. NMatrix/NumRuby projects
  3. Making daru-view independent
  4. Improvements & Enhancement in Daru
  5. Ruby with machine learning Rust
  6. Technical Analysis with Ruby
  7. Ruby and the common workflow language (CWL)
  8. Binding SciRuby against HPC Rust libraries for artificial intelligence, linear algebra etc.
  9. Apply gaming technology with the pangenome explorer using graphics on GPU
  10. Ruby for scientific publications
  11. Improve Ruby wrapper for SymEngine
  12. Ruby wrapper for Shogun - Machine learning library

Or submit your own idea

Contact

You can join us in the #sciruby channel on chat.freenode.net or via our mailing list.

IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our program. We are happy to hear from you all!

Instructions for students

We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed. These projects are not carved in stone, we can still adapt them to your ideas. Note the new GSoC is much shorter than in previous years. Projects should match the time line.

You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to sign up) and we can help you.

See also:

Read this before you commit your first patches

Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.

Try reading Finding The SciRuby Development Repositories on Github if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.

How to submit a patch ("pull request")

Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/

Have a look and feel free to ask if you have any questions.

Instructions for mentors

Guidelines for mentors to submit projects:

  • Specify the name of your project as a heading.
  • Write a paragraph or two with further details.
  • Write a small 'Skills' section detailing the skills that the student must possess to complete the project.
  • Write down your own GitHub handle and contact details in a 'Mentor Details' section over which the student can contact you.
  • If anyone else wants to co-mentor a project, please specify your details along with the mentor's details.

Usually C-extensions are written for speed. Rust is a safe alternative that can reach comparable speeds and has high level abstractions for multi-core programming.

The student will work on pangenome functionalities, optimize them, document them and provide a path for similar exercises that can be done by others. If you want to know more about pangenomes and why they are at the cutting edge of research in (human) genetics: watch the talk by Erik Garrison.

Skills: Interest in multi-languages, high performance computing, C, Rust etc.

Difficulty: Advanced (indeed)

Mentor: @pjotrp, @george-githinji, @chfi, @ekg


NumRuby projects

NumRuby is a successor of NMatrix. NumRuby is a linear algebra library for Ruby that is highly performance oriented.

Improving NumRuby

NumRuby is a successor of NMatrix. NumRuby is a linear algebra library for Ruby that is highly performance oriented.

  • Add serialization support.
  • Slicing to make use of view instead of copying data.
  • Fix broadcasting.
  • Implement random engine.
  • Release NumRuby gem.
  • Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)

Make NumRuby back-end agnostic

Currently, NumRuby uses OpenBLAS for matrix, vector products. A user should also be able to use other BLAS implementations such as Intel MKL.

  • Decouple NumRuby code from OpenBLAS.
  • Implement generic code for BLAS library interface.
  • Make sure that the library is working as expected using different BLAS implementations. Write tests for same.
  • Write benchmarking code for the same.
  • Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)

Adding graph algorithms support for Ruby-Sparse

NumRuby is for dense matrices computation. For sparse matrices computation, we have Ruby-Sparse. Ruby-Sparse is a relatively newer project with a lot of potential. Sparse matrices are well suited for Graph algorithms. We currently don't have any graph algorithms support and it is quite useful to have this in the library itself.

  • Read and understand the most used graph algorithms. Come up with the most optimal implementations for these algorithms.
  • Implement the most basic graph algorithms for CSR/CSC and DIA sparse types.
  • Implement some of the frequently used advance graph algorithms for CSR only.
  • Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)

Implement Block sparse type support

Block Compressed Row (BSR) sparse matrix format is a type of sparse matrix implementation which has recently been used quite frequently in the scientific work and hence been recently implemented in most of the major sparse libraries.

  • Provide BSR sparse matrix support.
  • Implement efficient conversion with dense matrix libraries like NumRuby and Numo-NArray.
  • Implement efficient conversion with other sparse implementations (CSR, CSC, DIA).
  • Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)

Making daru-view independent

Learn basics of daru-view, from sciruby/blog or daru-view/wiki.

Daru (Data Analysis in RUby) is a library for analysis, manipulation and visualization of data. daru-view is for easy and interactive plotting in web application & IRuby notebook. It can work in frameworks like Rails, Sinatra, Nanoc and hopefully in others too.

It is a plugin gem to Data Analysis in RUby(Daru) for visualisation of data

Currently daru-view have dependencies with lazy_high_charts and googlevisualr, where SciRuby don't have any control. We have solved problems like (mainly):

  • daru dataframe or vector compatible plotting gem.
  • a gem that can work smoothly in any Ruby web application framework, IRuby notebook as well as terminal.

So now it is the time to be independent,

Because:

  • we don't have much control over these gems and also we will be keep adding new features directly from HighCharts and Google Charts official sites.

  • we have extended (overload and override) most of the methods from lazy_high_charts and googlevisualr, to make it compatible for IRuby notebook and all ruby frameworks or to add new chart features already presents in HighCharts and Google Charts.

  • daru-view should be able to handle future chart types as well without (or very less) modifying codebase.

You can find more details about in this wiki page - 'Making daru-view independent'.Along with this we also want to consider new ideas written in Idea wiki page

Related links

About project

  • Skills: Basic knowledge of Ruby, Design pattern and Design Principles, Javascript and Ruby web application frameworks.
  • Mentors: Shekhar (@Shekharrajak), Sameer (@v0dro)
  • Difficulty: Moderate.

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby. Th has various features like :

  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary. and so on.

You can find most of the examples in here

While it has many methods for data wrangling, it is slow for a lot of use cases (check out these benchmarks). This task will involve figuring out the slow areas of daru and porting them to Rubex, which is a language for writing C extensions for Ruby or using simple Ruby C extension.

  • Student needs to benchmark various daru methods and check how the Ruby C binding can help significant performance boost.
  • List out features that are essential for data science and not present in daru currently.
  • How can we improve the performance using parallel programming in Ruby?
  • How can we remove visualization and I/O APIs from daru and use the daru-view and daru-io plugin gems instead?

Why this project is important:

  • SciRuby is planning for a powerful and fast Machine learning gem, that will be completely compatible with daru and namtrix gem. So we have to make daru faster and more powerful accordingly. We need to find a solution using namtrix as well.

  • If we want to improve Ruby for Data Science usage we have to keep update the daru features and it's API as per the present situation.

  • We already have plugin gems for visualization and I/O operation which is stable and functional. So we may now think about removing it from daru and use the daru-io and daru-view instead.

Other tasks

  • Better error handler. Refer #479
  • Follow-up of GSoC'17: remove obsolete parts from main gem #405

Related links

More about daru

Skills: Experience in data analysis | Experience in Ruby and C | General understanding of how compilers work | Understanding of good benchmarking practices

Difficulty: Advanced

Mentor: @v0dro, Shekhar (@Shekharrajak)


Technical Analysis library for exchange based securities and commodities market.

Interested students should take a look at https://github.com/rivella50/talib-ruby . We are going to build on top of it.

Skills required: Maths, Statistics, Finance, C and Ruby

Difficulty: Medium

Mentor: @prasunanand, @uditgulati)


CWL is a specification for building pipelines of tools. The configuration is in YAML. We would like to create a Ruby DSL that can generate these YAML definitions so we have an elegant way for deploying workflows on compute clusters. CWL is use, for example, in COVID-19 PubSeq

The student will create a number of pre-agreed functionalities, optimize them, document them and provide a path for similar exercises that can be done by others.

Skills: Interest in DSLs, workflows, parallel computing.

Difficulty: Average

Mentor: @pjotrp, @george-githinji, @mr-c


Usually C-extensions are written for speed. Rust is a safe alternative that can reach comparable speeds and has high level abstractions for multi-core programming. In SciRuby we love all languages that start with the letter R.

The student will bind a number of pre-agreed functionalities, optimize them, document them and provide a path for similar exercises that can be done by others. Software deployment of mixed languages often proves difficult.

Skills: Interest in multi-languages, high performance computing, C, Rust etc.

Difficulty: Advanced (indeed)

Mentor: @pjotrp, @george-githinji, @chfi


~1min video showing a yeast pangenome: https://youtu.be/TOJZeeCqatk

gfaestus is a Vulkan-accelerated GFA visualization tool for pangenomes. Help add Ruby bindings and Rust functionality for the pangenome explorer. Check out this online video

Vulkan is a low-overhead, cross-platform 3D graphics and computing API. Vulkan targets high-performance realtime 3D graphics applications such as video games and interactive media across all platforms. Compared to OpenGL, Direct3D 11 and Metal, Vulkan is intended to offer higher performance and more balanced CPU/GPU usage.

Code is at https://github.com/chfi/gfaestus

Skills: Interest in multi-languages, GPU graphics, FPS, C, Rust etc.

Difficulty: Advanced (indeed)

Mentor: @chfi, @ekg, @pjotrp and others on matrix/element pangenome groups


The backend for the Journal of Open Source Software (JOSS) is written in Ruby and makes full use of the Github API. The full work flow is based on the github issue tracker. In this project we want to refactor the source code and make it flexible to it can target multiple backends and be built on a full free software stack. Ruby is ideal for web-programming and this work is embedded in development happening for JOSS. With this publication oriented software we also target other journals, such as the BiohackrXiv. For the existing code base see https://github.com/openjournals/ Whedon and whedon-api repositories.

Skills: Interest in Ruby, web programming, Github API and scientific publishing

Difficulty: Moderate

Mentor: @pjotrp, @ktym, @arfon, members of @openjournals


A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library.

It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in.

The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:

  • Extending the C interface of SymEngine library.
  • Wrapping up the C interface for Ruby using Ruby C API, including error handling.
  • Designing the Ruby interface.
  • Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
  • Integrating the gem with existing gems like gmp, mpfr and mpc.
  • Making the installation of symengine gem easier.

You can find the same idea in SymPy Idea-list here

Important links: - GSoC 2016 report - GSoC 2015 work

Recommended skills: You should be comfortable with C/C++ and familiar with Ruby. Refer to the wiki to get started.

Mentors: Co-mentor @Shekharrajak and @pjotrp


Shogun is an open-source machine learning library that offers a wide range of efficient and unified machine learning methods. It is written in C++ and provides Ruby wrapper as well.

We have plan to make it compatible with SciRuby data science related gems like: daru, daru-io & daru-view, nmatrix, rubyplot, distribution, statsample and all other which is useful in some point for data science projects.

Ongoing discussion is happening here: #4814.

SciRuby and Shogun team will be collaborating to make it happen.

Potential mentor: @prasunanand Co-mentor: @shekharrajak


If you have something completely different idea in your mind. First, you should start a discussion thread on the mailing list for your idea. The SciRuby will surely look into it and the idea may get improved during the discussion to be selected for GSoC period.

The best project for you is one you are interested in and are knowledgeable about. That way, you will be the most successful and productive in your project and have the most fun doing it, while we will be the most confident in your commitment and your ability to complete it.

Please use the below Idea Template to Mention Ideas:

Title

Idea

(project idea, how it will help Ruby community and future of the project)

Current status of the idea

(Describe the work that has been done and timeline)

Involved Software and technology

Difficulty

(Advanced, Intermediate, or Beginner and any specific comments on the difficulty)

Skills and Knowledge required

(Any prerequisite knowledge or approach needed)

Clone this wiki locally