-
Notifications
You must be signed in to change notification settings - Fork 128
Home
louismullie edited this page Aug 4, 2012
·
72 revisions
Treat is a framework for natural language processing and computational linguistics in Ruby. It provides a common API for a number of gems and external libraries for document retrieval, parsing, annotation, and information extraction.
Current features
- Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
- Text retrieval with indexation and full-text search (Ferret).
- Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
- Word inflectors, including stemmers, conjugators, declensors, and number inflection.
- Lexical resources (WordNet interface, several POS taggers for English).
- Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
- Serialization of annotated entities to YAML, XML or to MongoDB.
- Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
- Linguistic resources, including language detection and tag alignments for several treebanks.
- Machine learning (decision tree, multilayer perceptron, linear, support vector machines).
Resources
- Read the latest documentation.
- See how to install Treat.
- Learn how to use Treat.
- Help out by contributing to the project.
- View a list of papers about tools included in this toolkit.
- Open an issue.
License
This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.