Skip to content

NLP API for English texts available by HTTP-REST and AVRO

License

Notifications You must be signed in to change notification settings

librairy/nlpEN-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

librAIry NLP toolkit for English texts

nlpEN-service provides a fast and easy way to analyze texts from large document corpora by using both HTTP-Restful and AVRO API.

Features

  • Part-of-Speech Tagger (and filter)
  • Lemmatizer
  • N-Grams Identifier
  • Wikipedia resource Finder
  • Annotate your text with the elements discovered
  • Can be run locally using multiple threads, or in parallel on multiple machines

Quick Start

Run locally

To run NLP-EN service using the default dataset:

  1. Install Docker and Docker-Compose

  2. Clone this repo and move into its top-level directory.

    git clone [email protected]:librairy/nlpEN-service.git
    
  3. Run the service by: docker-compose up -d

  4. You should be able to monitor the progress by: docker-compose logs -f

  • The above command runs two services: DBpedia Spotlight and librAIry NLP-EN, and uses the settings specified within docker-compose.yml.
  • The HTTP Restful-API should be available at: http://localhost:7777/en

Run in distributed mode

Create a Swarm and configure as services as you need.

Configuration

To change configuration, just edit the docker-compose.yml file.

Config Description
REST_PATH Endpoint where service is listening.
HTTP port Internal HTTP port is 7777
AVRO port Internal AVRO port is 65111

Services

All services can include lemmatizer actions, part-of-speech tagging and even n-grams identifications:

  • /annotations : annotate each word in a given text.
  • /tokens: modify a given text to only contains valid tokens as expressed in the request.
  • /groups: build a bag-of-words from a given text.

Reference

You can use the following to cite the service:

@inproceedings{Badenes-Olmedo:2017:DTM:3103010.3121040,
 author = {Badenes-Olmedo, Carlos and Redondo-Garcia, Jos{\'e} Luis and Corcho, Oscar},
 title = {Distributing Text Mining Tasks with librAIry},
 booktitle = {Proceedings of the 2017 ACM Symposium on Document Engineering},
 series = {DocEng '17},
 year = {2017},
 isbn = {978-1-4503-4689-4},
 pages = {63--66},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3103010.3121040},
 doi = {10.1145/3103010.3121040},
 acmid = {3121040},
 publisher = {ACM},
 keywords = {data integration, large-scale text analysis, nlp, scholarly data, text mining},
} 

Contact

This repository is maintained by Carlos Badenes-Olmedo. Please send me an e-mail or open a GitHub issue if you have questions.

About

NLP API for English texts available by HTTP-REST and AVRO

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages