Google Cloud Natural Language 'lite'

A reduced version of the official service which mimics part of the Google Cloud Natural Language RESTful API using CoNLL 2017 models.

Limitations

The service follows the same structural interaction than the Google one but it uses the very tags of the CoNLL 2.0 specification. It lacks several out of scope features, like:

limited to one single language, therefore no auto-detection
ignored sentence break (periods, full stops...), treating all content as one single phrase
ignored "type" fixed to "PLAIN_TEXT"
ignored "encodingType" fixed to "UTF8"
skipped "sentiment" analysis returning "magnitude" and "score" always 0
adapted "partOfSpeech" keys and values to the FEATS CoNNL 2.0 convention
adapted "dependencyEdge" "label" to the DEPREL CoNNL 2.0 convention
skipped "lemma" value always to ""

Request:

{
    "document": {
        "type": "PLAIN_TEXT",
        "language": "es",
        "content": <the text to be analyzed>,
        },
    "encodingType": "UTF8"
}

Response:

{
    "sentences": [
        {
            "text": {
                "content": <the analyzed text>,
                "beginOffset": 0,
            },
            "sentiment": {
                "magnitude": 0,
                "score": 0,
            },
        }
    ],
    "tokens": [
        {
            "text": {
                "content": <every word or part of a word>,
                "beginOffset": <character offset from beginning>,
            },
            "partOfSpeech": {
                "<key>": <value>,
                "<key>": <value>,
            },
            "dependencyEdge": {
                "headTokenIndex": <number reference to head token>,
                "label": <kind of reference>,
            },
            "lemma": string
        }
    ],
    "language": "es"
}

Installation

Due it relies in Syntaxnet Dragnn, which only runs on Python 2.7 in a precise version of Tensorflow, there's a need to complile it locally according to the following instructions meanwhile a container version yet to come is in the works:

Installing virtualenv

> sudo -H pip install virtualenv

Setting up virtual environments

> cd <virtualenvs_home>
> virtualenv -p python2.7 gcnl_lite2.7

Activating and deactivating a virtual environment

> cd <virtualenvs_home>/gcnl_lite2.7
> source bin/activate

Once the virtual environment has been activated you can go whatever directory you want. All the pip installations will be hold in the activated virtual environment. In order to deactivate it just execute deactivate in whichever directory you are.

Installing Bazel 0.5.4 for Tensorflow compilations

Tensorflow only compiles with Bazel 0.5.4 so you must install it in your virtual environment. Given Syntaxnet only works in Python 2.7, please activate gcnl_lite2.7 and follow the download and installing instructions as referenced in the Syntaxnet installation guide to download your OS version (remember MacOS is known as darwin). Then run this way to install it under the virtual environment:

> chmod +x <download>/bazel-0.5.4-installer-darwin-x86_64.sh
> <download>/bazel-0.5.4-installer-darwin-x86_64.sh --prefix=$VIRTUAL_ENV

Installing Syntaxnet

Follow the README.md instructions to complete the manual installations and then do the build and test with the following commands:

> cd <projects_path>
> git clone --recursive https://github.com/tensorflow/models.git
> cd models/research/syntaxnet/tensorflow
> ./configure
> cd ..
> # On Mac, run the following:
> bazel test --linkopt=-headerpad_max_install_names dragnn/... syntaxnet/... util/utf8/...

Now it’s time to install with some variant to include Tensorflow.

> mkdir /tmp/syntaxnet_pkg
> bazel-bin/dragnn/tools/build_pip_package --include-tensorflow --output-dir=/tmp/syntaxnet_pkg
> sudo -H pip --no-cache-dir install /tmp/syntaxnet_pkg/syntaxnet_with_tensorflow-0.2-cp27-cp27m-macosx_10_6_intel.whl

There can be some extra import needs

> pip install backports.weakref

Update checkpoint files

If you are willing to use the English and Spanish languages that are distributed in syntaxnet/examples/dragnn/data (for instance with the nearby *.ipynb notebooks) you MUST convert the checkpoint files to the appropriate protocol buffer version using models/research/syntaxnet/tensorflow/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py

Extra Languages

You can download up to 60 different language models here. All of them must get the mentioned checkpoint file updates. In order to ease installation, lang_models directory includes six of the nine languages Google Cloud Natural Language offers today with the appropriate checkpoint file update already done.

Running the service

You just have to run python gcnl_lite.py en <wherever you left your English model> or changing en per es if you have downloaded Spanish model. Service keeps on listening at port 7000. Use curl or PostMan to invoke the only endpoint http://localhost:7000/v1/documents:analyzeSyntax following the very Google REST interface.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
gcnl_lite		gcnl_lite
lang_models/es		lang_models/es
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Cloud Natural Language 'lite'

Limitations

Installation

Installing virtualenv

Setting up virtual environments

Activating and deactivating a virtual environment

Installing Bazel 0.5.4 for Tensorflow compilations

Installing Syntaxnet

Update checkpoint files

Extra Languages

Running the service

About

Releases

Packages

Languages

License

ConchaLang/gcnl_lite

Folders and files

Latest commit

History

Repository files navigation

Google Cloud Natural Language 'lite'

Limitations

Installation

Installing virtualenv

Setting up virtual environments

Activating and deactivating a virtual environment

Installing Bazel 0.5.4 for Tensorflow compilations

Installing Syntaxnet

Update checkpoint files

Extra Languages

Running the service

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages