This repository contains the code and experiments for the final year project in Computer Science, focusing on Emotion-Aware Speech Generation with Integrated Text Analysis using emotion embeddings from a RoBERTa model. It includes various Natural Language Processing (NLP) experiments performed during an NLP course, as well as a modified version of an existing text-to-speech synthesis codebase.
Please visit the GitHub page to view comparative samples.
Or generate new ones on HuggingFace
The project aims to generate emotion-aware speech using a modified text-to-speech synthesis system. By integrating emotion embeddings from a RoBERTa model, the generated speech output exhibits the desired emotions as specified by the input text.
FYP_Notebooks/
: Contains various notebooks for different experiments and data processing methodsFastSpeech2_Text_Aware_Emotion_TTS/
: Contains the modified text-to-speech synthesis codebase for emotion-aware speech generation.Transformers_for_NLP/
: Contains various NLP experiments conducted during the Data Science: Transformers for Natural Language Processing course.Utils/
: Contains the code for processing and preparing the data for training and evaluation.
To run the experiments and use the Emotion-Aware Speech Generation system, follow these steps:
- Clone this repository:
git clone https://github.com/ionut-cmd/FYP.git
- Navigate to the
FastSpeech2_Text_Aware_Emotion_TTS/
directory. - Follow the installation and usage instructions provided in the
FastSpeech2_Text_Aware_Emotion_TTS/README.md
file.
This project is based on the ming024/FastSpeech2 for text-to-speech synthesis. I would like to thank the original author for their work, which served as a starting point for this project.
This project is licensed under the MIT License.