Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Numo Gem for performing SVD #198

Merged
merged 1 commit into from
Jun 9, 2022
Merged

Conversation

mkasberg
Copy link
Contributor

@mkasberg mkasberg commented Jun 2, 2022

Background:
The slow step of LSI is computing the SVD (singular value decomposition)
of a matrix. Even with a relatively small collection of documents (say,
about 20 blog posts), the native ruby implementation is too slow to be
usable (taking hours to complete).

To work around this problem, classifier-reborn allows you to optionally
use the gsl gem to make use of the Gnu Scientific
Library
when performing matrix
calculations. Computations with this gem perform orders of magnitude
faster than the ruby-only matrix implementation, and they're fast enough
that using LSI with Jekyll finishes in a reasonable amount of time
(seconds).

Unfortunately, rb-gsl is
unmaintained -- there's a commit on main that makes it compatible with
Ruby 3, but nobody has released the gem so the only way to use rb-gsl
with Ruby 3 right now is to specify the git hash in your Gemfile. See
SciRuby/rb-gsl#67. This will be increasingly
problematic because Ruby 2.7 is now in security
maintenance

and will become end of life in less than a year.

Notably, rb-gsl depends on the
narray
gem. narray is deprecated, and the readme suggests using
Numo::NArray instead.

Changes:
In this PR, my goal is to provide an alternative matrix implementation
that can perform singular value decomposition quickly and works with
Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3
without depending on the unmaintained/unreleased gsl gem. There aren't
many gems that provide fast matrix support for ruby, but
Numo seems to be more actively
maintained than rb-gsl, and Numo has a working Ruby 3 implementation
that can perform a singular value decomposition, which is exactly what
we need. This requires
numo-narray and
numo-linalg.

My goal is to allow users to (optionally) use classifier-reborn with
Numo/Lapack the same way they'd use it with GSL. That is, the user
should install the numo-narray and numo-linalg gems (with their
required C libraries), and classifier-reborn will detect and use these
if they are found.

@mkasberg
Copy link
Contributor Author

mkasberg commented Jun 2, 2022

@mattr- appreciate all the PR reviews you've already done for me! I still have this PR marked as a draft, but I think it's ready for an initial round of feedback if you have time to do a review. Here are a couple things to focus on:

  • We were previously using a global $GSL to indicate whether to use the gsl gem. I'm currently taking a similar approach, replacing that $GSL boolean with $SVD, which can be one of :ruby, :gsl, or :numo. I'm open to suggestions if you'd prefer a different approach.
  • I'm operating under the assumption that we want to continue to avoid a dependency on any new gems (as we have in the past with the gsl gem). So I'm applying the same patterns that existed before to try loading a library and falling back, which you can see near the top of lsi.rb.
  • Note that tests are passing with Numo for ruby 3.1, 3.0, and 2.7! I haven't added any additional tests because I don't expect to change any behavior in an observable way. And I haven't done this yet, but before merging a final version of this PR I plan on building the gem locally and testing it with a real jekyll site to look for any differences in output.

Copy link
Member

@mattr- mattr- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick 👀

Looks good so far.

Comment on lines 4 to 5
# require_relative '../test_helper'
# require 'debug'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it might be debugging code left over. Would you mind removing it?

**Background:**
The slow step of LSI is computing the SVD (singular value decomposition)
of a matrix. Even with a relatively small collection of documents (say,
about 20 blog posts), the native ruby implementation is too slow to be
usable (taking hours to complete).

To work around this problem, classifier-reborn allows you to optionally
use the `gsl` gem to make use of the [Gnu Scientific
Library](https://www.gnu.org/software/gsl/) when performing matrix
calculations. Computations with this gem perform orders of magnitude
faster than the ruby-only matrix implementation, and they're fast enough
that using LSI with Jekyll finishes in a reasonable amount of time
(seconds).

Unfortunately, [rb-gsl](https://github.com/SciRuby/rb-gsl) is
unmaintained -- there's a commit on main that makes it compatible with
Ruby 3, but nobody has released the gem so the only way to use rb-gsl
with Ruby 3 right now is to specify the git hash in your Gemfile. See
SciRuby/rb-gsl#67. This will be increasingly
problematic because Ruby 2.7 is now in [security
maintenance](https://www.ruby-lang.org/en/news/2022/04/12/ruby-2-7-6-released/)
and will become end of life in less than a year.

Notably, `rb-gsl` depends on the
[narray](https://github.com/masa16/narray#new-version-is-under-development---rubynumonarray)
gem. `narray` is deprecated, and the readme suggests using
`Numo::NArray` instead.

**Changes:**
In this PR, my goal is to provide an alternative matrix implementation
that can perform singular value decomposition quickly and works with
Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3
without depending on the unmaintained/unreleased gsl gem. There aren't
many gems that provide fast matrix support for ruby, but
[Numo](https://github.com/ruby-numo) seems to be more actively
maintained than rb-gsl, and Numo has a working Ruby 3 implementation
that can perform a singular value decomposition, which is exactly what
we need. This requires
[numo-narray](https://github.com/ruby-numo/numo-narray) and
[numo-linalg](https://github.com/ruby-numo/numo-linalg).

My goal is to allow users to (optionally) use classifier-reborn with
Numo/Lapack the same way they'd use it with GSL. That is, the user
should install the `numo-narray` and `numo-linalg` gems (with their
required C libraries), and classifier-reborn will detect and use these
if they are found.
@mkasberg mkasberg marked this pull request as ready for review June 4, 2022 14:47
@mkasberg
Copy link
Contributor Author

mkasberg commented Jun 4, 2022

@mattr- This is ready for review! No rush 🙂

I addressed your previous comment, added a little polish, and updated the docs since your last review.

Also, I tested this on a personal jekyll site (i.e. with gem 'classifier-reborn', path: '~/code/classifier-reborn' in my Gemfile) and verified that using Numo procudes the same recommended sites in the Jekyll output as GSL.

@mattr-
Copy link
Member

mattr- commented Jun 9, 2022

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit e6153b4 into jekyll:master Jun 9, 2022
jekyllbot added a commit that referenced this pull request Jun 9, 2022
@mkasberg mkasberg deleted the numo-impl branch June 9, 2022 13:36
mkasberg added a commit to mkasberg/classifier-reborn that referenced this pull request Aug 7, 2022
In jekyll#198, I added support for using
[Numo::Linalg](https://github.com/ruby-numo/numo-linalg) as the linear
algebra backend for classifier-reborn. At that time, I updated the docs
with instructions for installing Numo, but the macOS docs were a little
vague because I hadn't tested them myself. Since then, I've been able to
verify the instructions on macOS and clarify a few steps. So this commit
updates the docs for installing Numo on macOS. The gem installation
arguments I'm using come from the [Numo
docs](https://github.com/ruby-numo/numo-linalg/blob/master/doc/select-backend.md).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants