librAIry NLP toolkit for English texts

nlpEN-service provides a fast and easy way to analyze texts from large document corpora by using both HTTP-Restful and AVRO API.

Features

Part-of-Speech Tagger (and filter)
Lemmatizer
N-Grams Identifier
Wikipedia resource Finder
Annotate your text with the elements discovered
Can be run locally using multiple threads, or in parallel on multiple machines

Quick Start

Run locally

To run NLP-EN service using the default dataset:

Install Docker and Docker-Compose

Clone this repo and move into its top-level directory.

git clone [email protected]:librairy/nlpEN-service.git

Run the service by: docker-compose up -d
You should be able to monitor the progress by: docker-compose logs -f

The above command runs two services: DBpedia Spotlight and librAIry NLP-EN, and uses the settings specified within docker-compose.yml.
The HTTP Restful-API should be available at: http://localhost:7777/en

Run in distributed mode

Create a Swarm and configure as services as you need.

Configuration

To change configuration, just edit the docker-compose.yml file.

Config	Description
`REST_PATH`	Endpoint where service is listening.
`HTTP port`	Internal HTTP port is 7777
`AVRO port`	Internal AVRO port is 65111

Services

All services can include lemmatizer actions, part-of-speech tagging and even n-grams identifications:

/annotations : annotate each word in a given text.
/tokens: modify a given text to only contains valid tokens as expressed in the request.
/groups: build a bag-of-words from a given text.

Reference

You can use the following to cite the service:

@inproceedings{Badenes-Olmedo:2017:DTM:3103010.3121040,
 author = {Badenes-Olmedo, Carlos and Redondo-Garcia, Jos{\'e} Luis and Corcho, Oscar},
 title = {Distributing Text Mining Tasks with librAIry},
 booktitle = {Proceedings of the 2017 ACM Symposium on Document Engineering},
 series = {DocEng '17},
 year = {2017},
 isbn = {978-1-4503-4689-4},
 pages = {63--66},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3103010.3121040},
 doi = {10.1145/3103010.3121040},
 acmid = {3121040},
 publisher = {ACM},
 keywords = {data integration, large-scale text analysis, nlp, scholarly data, text mining},
}

Contact

This repository is maintained by Carlos Badenes-Olmedo. Please send me an e-mail or open a GitHub issue if you have questions.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

librAIry NLP toolkit for English texts

Features

Quick Start

Run locally

Run in distributed mode

Configuration

Services

Reference

Contact

About

Releases

Packages

Languages

License

librairy/nlpEN-service

Folders and files

Latest commit

History

Repository files navigation

librAIry NLP toolkit for English texts

Features

Quick Start

Run locally

Run in distributed mode

Configuration

Services

Reference

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages