Skip to content
louismullie edited this page Jul 11, 2012 · 6 revisions

Here is a list of ideas for contributing to the project. If you want to write bindings for Java/C libraries, I would prefer that you package it as a gem in order to keep the core code as clean as possible. Eventually, some of the current core code is also probably going to be moved to external gems.

Information Extraction

Crawling and Indexing

  • Ferret gem - "Ferret is a port of the Java Lucene project."
  • Spidr gem - "Spidr is a versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely." (http://spidr.rubyforge.org/)
  • Anemone
  • Sphinx

Semantics, RDF and logical inference

Text Statistics

Machine Learning and Artificial Intelligence

Math and Graph

Parsers and Chunkers

Toolkits

Stemmers

Taggers

Inflection

Clone this wiki locally