trivor-nlp

trivor-nlp leverages the use of NPL (Natural Language Processing) to detect sentences, tokens as well as the meaning of each token in the given sentence. After processing all sentences, several generators will produce valuable information that can be easily consumed.

Prerequisites

Java 8+

Usage

1. Add dependency

<dependency>
  <groupId>org.kalnee</groupId>
  <artifactId>trivor-nlp</artifactId>
  <version>0.0.1-alpha.2</version>
</dependency>

2. Create a Processor

trivor-nlp provides two processors:

TranscriptProcessor: general-purpose processor, the content must be accessed either via URI or String.
SubtitleProcessor: subtitle-only processor, the content must be accessed via URI.

Accepted URI schemas: file://, jar:// and s3:// (Make sure to have the AWS Credentials in place.)

Create a TranscriptProcessor from URI or String

// from URI
TranscriptProcessor tp = new TranscriptProcessor.Builder(uri).build();

// from String
TranscriptProcessor tp = new TranscriptProcessor.Builder("This is a sentence.").build();

Customize

Filters and mappers

For each line in the provided content, custom filters and mappers can be used to clean up the text before running the NLP models. Both fields are optional.

TranscriptProcessor tp = new TranscriptProcessor.Builder(uri)
        .withFilters(singletonList(line -> !line.contains("Name")))
        .withMappers(singletonList(line -> line.replaceAll(TRANSCRIPT_REGEX, EMPTY)))
        .build();

Settings

The following values can be overwritten by adding the Config class when building a Processor:

Vocabulary probability: Double (default: 0.9) e.g. it'll only be accepted verbs with a probability >= 90%
Chunk probability: Double (default: 0.5)
Run Sentiment Analysis: Boolean (default: true)

TranscriptProcessor tp = new TranscriptProcessor.Builder(uri)
        .withConfig(new Config.Builder().vocabularyProb(.98).chunkProb(.98).sentimentAnalysis(false).build())
        .build();

Create a SubtitleProcessor from URI

final SubtitleProcessor sp = new SubtitleProcessor.Builder(uri).withDuration(43).build();

All the necessary filters and mappers have already been provided for a Subtitle.

3. Result

After successfully building a processor, the NLP results can be accessed as follows:

processor.getSentences()

This method return the list of sentences. Each sentence is composed by the identified tokens, tags and chunks:

{
            "sentence" : "My name's Forrest.",
            "tokens" : [ 
                {
                    "token" : "My",
                    "tag" : "PRP$",
                    "lemma" : "my",
                    "prob" : 0.976362822572366
                }, 
                {
                    "token" : "name",
                    "tag" : "NN",
                    "lemma" : "name",
                    "prob" : 0.98267246788283
                }, 
                {
                    "token" : "'s",
                    "tag" : "POS",
                    "lemma" : "'s",
                    "prob" : 0.933313435543914
                }, 
                {
                    "token" : "Forrest",
                    "tag" : "NNP",
                    "lemma" : "forrest",
                    "prob" : 0.908174572293974
                }, 
                {
                    "token" : ".",
                    "tag" : ".",
                    "lemma" : ".",
                    "prob" : 0.982098322024085
                }
            ],
            "chunks" : [ 
                {
                    "tokens" : [ 
                        "My", 
                        "name"
                    ],
                    "tags" : [ 
                        "PRP$", 
                        "NN"
                    ]
                }, 
                {
                    "tokens" : [ 
                        "'s", 
                        "Forrest"
                    ],
                    "tags" : [ 
                        "POS", 
                        "NNP"
                    ]
                }
            ]
        }

processor.getResult()

This method return a Result object with many different insights such as:

Rate of Speech (only for Subtitles)
Frequency Rate
Frequent Sentences
Frequent Chunks
Vocabulary
Phrasal Verbs
Sentiment Analysis

The full documentation can be accessed here.

License

MIT (c) Kalnee. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.circleci		.circleci
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trivor-nlp

Prerequisites

Usage

1. Add dependency

2. Create a Processor

Create a TranscriptProcessor from URI or String

Customize

Filters and mappers

Settings

Create a SubtitleProcessor from URI

3. Result

processor.getSentences()

processor.getResult()

License

About

Releases 2

Packages

Languages

License

kalnee/trivor-nlp

Folders and files

Latest commit

History

Repository files navigation

trivor-nlp

Prerequisites

Usage

1. Add dependency

2. Create a Processor

Create a TranscriptProcessor from URI or String

Customize

Filters and mappers

Settings

Create a SubtitleProcessor from URI

3. Result

processor.getSentences()

processor.getResult()

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages