This project attempts to classify the emotion of a given audio file as "Calm", "Happy", "Fearful" and "Disgust". It has been documented in accordance with the Machine Learning Life Cycle consisting of 7 major stages in a Machine Learning project. Currently, the accuracy of the model is around 70%. It can recognise the emotion "Calm" with the highest accuracy but has difficulty in recognising "Fearful".
The RAVDESS dataset has been used in this project. It was a total of 8 emotions out of which I pick and chose 4 emotions on which the current model has been trained. The model extracted features of speech recognition like MFCC, Chroma, Mel, Spectral Contrast and Zero Crossing Rate.It was found that there is not much change in the accuracy of the model with Spectral contrast and zero crossing rate. The model used was mlpClassifier since this was a classification problem. The hyperparameters for the model were tuned using GridSearchCV. I'm planning on trying out other models with the dataset like Random Forest to see if I get increased accuracy.