forked from jekyll/classifier-reborn
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
**Background:** The slow step of LSI is computing the SVD (singular value decomposition) of a matrix. Even with a relatively small collection of documents (say, about 20 blog posts), the native ruby implementation is too slow to be usable (taking hours to complete). To work around this problem, classifier-reborn allows you to optionally use the `gsl` gem to make use of the [Gnu Scientific Library](https://www.gnu.org/software/gsl/) when performing matrix calculations. Computations with this gem perform orders of magnitude faster than the ruby-only matrix implementation, and they're fast enough that using LSI with Jekyll finishes in a reasonable amount of time (seconds). Unfortunately, [rb-gsl](https://github.com/SciRuby/rb-gsl) is unmaintained -- there's a commit on main that makes it compatible with Ruby 3, but nobody has released the gem so the only way to use rb-gsl with Ruby 3 right now is to specify the git hash in your Gemfile. See SciRuby/rb-gsl#67. This will be increasingly problematic because Ruby 2.7 is now in [security maintenance](https://www.ruby-lang.org/en/news/2022/04/12/ruby-2-7-6-released/) and will become end of life in less than a year. Notably, `rb-gsl` depends on the [narray](https://github.com/masa16/narray#new-version-is-under-development---rubynumonarray) gem. `narray` is deprecated, and the readme suggests using `Numo::NArray` instead. **Changes:** In this PR, my goal is to provide an alternative matrix implementation that can perform singular value decomposition quickly and works with Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3 without depending on the unmaintained/unreleased gsl gem. There aren't many gems that provide fast matrix support for ruby, but [Numo](https://github.com/ruby-numo) seems to be more actively maintained than rb-gsl, and Numo has a working Ruby 3 implementation that can perform a singular value decomposition, which is exactly what we need. This requires [numo-narray](https://github.com/ruby-numo/numo-narray) and [numo-linalg](https://github.com/ruby-numo/numo-linalg). My goal is to allow users to (optionally) use classifier-reborn with Numo/Lapack the same way they'd use it with GSL. That is, the user should install the `numo-narray` and `numo-linalg` gems (with their required C libraries), and classifier-reborn will detect and use these if they are found.
- Loading branch information
Showing
8 changed files
with
95 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters