GitHub - chbridges/amazon-aspect-extractor: Automatic extraction and aggregated sentiment analysis of Amazon.com review keyphrases, trained on a laptop review dataset

Amazon Aspect Extractor

This project aims to extract and compile information about products on Amazon from their reviews. Aspect Extraction and Sentiment Analysis are employed in order to compute statistics and summaries of products given their URLs.

Usage

Requirements

Python 3.6+
Google Chrome
Tkinter (sudo apt-get install python3-tk)
pipenv (pip3 install pipenv)

Setup

Run pipenv install and pipenv run spacy in the root directory to install all necessary packages. The program can now be run using pipenv run main.

Troubleshooting

There is an issues with "Are You A Robot" pages from Amazon preventing the crawling of reviews. In this case, a ValueError exception will be thrown while the GUI does not close automatically.

To override this, launch the program using pipenv run debug.

In debug mode, the pages will be opened in the foreground and you get the opportunity to enter the image code of the "Are You A Robot" challenge. We generally recommend copying the product URLs from the Google Chrome browser so that the corresponding cookies ensure that no "Are You A Robot" challenge will be triggered.

Processing pipeline

Training State:

Results of best sentiment model on custom Laptop dataset:

Split \ Metric	Accuracy	Class balanced accuracy	F1 Score	Class Splits
LSTM Training	81.58%	85.25%	0.8407	-
LSTM Validation	68.12%	62.38%	0.7713	28.12%/8.75%/63.13%
LSTM Test	90.37%	64.56%	0.8787	-
Random Forest Training	62.22%	72.24%	0.5755	-
Random Forest Validation	57.63%	54.22%	0.4716	55.56%/9.03%/35.42%
Random Forest Test	55.00%	60.27%	0.4912	-
SVM Training	72.50%	65.63%	0.6480	-
SVM Validation	62.50%	36.59%	0.3619	14.89%/4.17%/81.94%
SVM Test	62.77%	41.17%	0.4066	-
Validation Set	-	-	-	37.97%/3.21%/58.82%

Sources:

Training Data:

Pre-crawled Amazon reviews:

https://registry.opendata.aws/amazon-reviews-ml/

Existing Code Fragments:

Sentiment LSTM: https://towardsdatascience.com/sentiment-analysis-using-lstm-step-by-step-50d074f09948
Dynamic input sizes for LSTMs: https://towardsdatascience.com/taming-lstms-variable-sized-mini-batches-and-why-pytorch-is-good-for-your-health-61d35642972e

Utilized libraries:

see requirements.txt
Google Chrome, available https://www.google.com/chrome/

Team members:

Christopher Brückner ([email protected])
Julius Ernesti ([email protected])
Raphael Kirchholtes ([email protected])
Armand Rousselot ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github/workflows		.github/workflows
.vscode		.vscode
htmlcov		htmlcov
milestone_01		milestone_01
src		src
tests		tests
.env		.env
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
TrainingDataQuickInsight.pbix		TrainingDataQuickInsight.pbix
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Aspect Extractor

Usage

Requirements

Setup

Troubleshooting

Processing pipeline

Training State:

Sources:

Training Data:

Pre-crawled Amazon reviews:

Existing Code Fragments:

Utilized libraries:

Team members:

About

Releases

Packages

Contributors 6

Languages

License

chbridges/amazon-aspect-extractor

Folders and files

Latest commit

History

Repository files navigation

Amazon Aspect Extractor

Usage

Requirements

Setup

Troubleshooting

Processing pipeline

Training State:

Sources:

Training Data:

Pre-crawled Amazon reviews:

Existing Code Fragments:

Utilized libraries:

Team members:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages