TextClassification

A pipeline for aspect-based sentiment classification, mining and summarizing opinions from text about specific entities and their aspects. Text input -> seq2seq IOB tagging -> pre-trained embedding vectors -> LSTM

Data Engineering: Text cleaning, contraction expansion, lemmatization End to end modelling and inferencing pipeline: Convert text sequence into embedding vectors Create PoS tags Sequence to sequence IOB2 tagging for OTE(opinion-term-expressions) identification

Example input: "Service was terribly slow and the restaurant was noisy, but the waiter was friendly and the calamari was very delicious." Example output: (OTE target words, entity:aspect, polarity) ("service", SERVICE:GENERAL, negative) ("restaurant", AMBIENCE:GENERAL, negative) ("waiter", SERVICE:GENERAL, positive) ("calamari", FOOD:QUALITY, positive)

All data files found at http://alt.qcri.org/semeval2016/task5/.

Background

Aim at understanding specific sentiment torwards different aspects of an entity.

Pipeline

Step 1: Identify OTE(Opinion Target Expression) Use a CNN with time-distributed dense layer at the output to perform seq-2-seq IOB2 tagging. Input: padded tokenized word sequences + pretrained embedding matrix Output: IOB2 tags identifying the sequence location of the OTEs Step 2: Identify entity:aspect pair and the sentiment polarity within each review sentence USE LSTM stacked on top of a CNN. Input: tokenized word sequences + pretrained embedding matrix, POS tags, IOB2 tags Output: aspect and polarity associating with a particular OTE

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
imgs		imgs
stanford.postagger		stanford.postagger
tests		tests
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
helper_function.py		helper_function.py
parse_data.py		parse_data.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextClassification

Background

Pipeline

Data engineering

Word indexing & word embedding (w2v)

TODO:

About

Releases

Packages

Languages

Carolyn95/TextClassification

Folders and files

Latest commit

History

Repository files navigation

TextClassification

Background

Pipeline

Data engineering

Word indexing & word embedding (w2v)

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages