Home

Build Status Dependency Status ]

Treat is a toolkit for natural language processing and computational linguistics. It provides a common API for a number of existing tools in C, Ruby and Java for document retrieval, parsing, annotation, and information extraction.

Warning

This GitHub project is currently out-of-sync with the gem. I am holding back on pushing a new gem until I am able to put out a well-tested release. This Wiki is applicable to the latest version of Treat, i.e. the one on GitHub. Things are currently moving fast; even if it builds, this library is still alpha.

Resources

Read the latest documentation.
See how to install Treat.
Learn how to use Treat.
Help out by contributing to the project.
View a list of papers about tools included in this toolkit.
Open an issue and get a quick answer.

**Current features**

Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus)
Text retrieval with indexation and full-text search (Ferret)
Text chunkers, sentence segmenters, tokenizers, and parsers for several languages (Stanford & Enju)
Word inflectors, including stemmers, conjugators, declensors, and number inflection
Lexical resources (WordNet interface, several POS taggers for English, Stanford taggers for several languages)
Language, date/time, general topic and keyword extraction
Simple text statistics (frequency, TF*IDF)
Serialization of annotated entities to YAML or XML format
Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats
Linguistic resources, including full ISO-639-1 and ISO-639-2 support, and tag alignments for five treebanks.

**Caveats/Planned features**

The few native Ruby statistics algorithms are slow. Some of the highly recursive code in the core Tree and Entity classes will need to be ported to inline C.
XML unserializer is currently broken; it will need to be fixed.
A faster WordNet API in Java will be interfaced.

**License**

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally