ecom-product-classifier

Task

From a dataset containing information about some products sold on an e-commerce site, the task is to predict product categories by extracting features from the product descriptions. In essence, it is a text classification problem.

Approach/Work Flow

Cleaning the data
Tokenisation:
- Count Vectorisation
- Term Frequency-Inverse Document Frequency (TF-IDF) Vectorisation
Different Classification algorithms are employed to determine which one of them yields the highest accuracy with testing data.

Cleaning the data

Removing punctuation
Converting text data to lower case
Stripping leading and trailing whitespaces
Removing stopwords

After these steps, most frequent product categories are identified. Entries having other categories are removed due to lack of sufficent data. The most common product categories can be visualised in the form of wordcloud as shown below:

Perform Tokenisation of Feature Set

Here, the goal is to effectively extract features from the text product description so that a product category can be predicted.
Firstly, the sentence vectors are obtained by applying count vectorisation.
Thereafter, we compute the TF-IDF weighted vectors such that more weight is given to particular words that form a distinguishing feature for a category.

Applying different Classification Algorithms

Naive Bayes Classifier
Logistic Regression
Support Vector Machine
Classification with Neural Network

Construction of Neural Network:

The NN contains 2 hidden layers.
The ReLU (Rectified Linear Unit) is used as the activation function.
The output layer consists of as many number of nodes as the number of product categories.
In the output layer, the softmax function is used as the activation function so that each node gives the probability of a product belonging to that category. The category with the highest probability is selected as the predicted category.

Following are the Accuracy and Loss Curves obtained as the model was run for 15 epochs!

Here is the Confusion Matrix obtained for the NN model:

Results

Classification Approach	Accuracy ( % )
Naive Bayes	81.31
Logistic Regression	94.17
Support Vector Machine	97.36
Neural Network	97.58

Clearly, the highest accuracy for predicting product categories is obtained by the Neural Network model.
The relative performance of each classification model can be visualised as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
datasets		datasets
MIDAS NLP Task-3.ipynb		MIDAS NLP Task-3.ipynb
README.md		README.md
nnet_product_classifier_model.h5		nnet_product_classifier_model.h5
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ecom-product-classifier

Task

Approach/Work Flow

Cleaning the data

Perform Tokenisation of Feature Set

Applying different Classification Algorithms

Results

About

Releases

Packages

Languages

kvarun07/ecom-product-classifier

Folders and files

Latest commit

History

Repository files navigation

ecom-product-classifier

Task

Approach/Work Flow

Cleaning the data

Perform Tokenisation of Feature Set

Applying different Classification Algorithms

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages