Recent studies have outlined the accessibility challenges that blind and visually impaired people face in interacting
-with social networks, with monotone text-to-speech (TTS) screen
-readers and audio narration of visual elements such as emojis.
-Emotional speech generation traditionally relies on human input
-of the expected emotion together with the text to synthesise,
-with additional challenges around data simplification (causing
-information loss) and duration inaccuracy, leading to lack of
-expressive emotional rendering. In real-life communications, the
-duration of phonemes can vary since the same sentence might
-be spoken in a variety of ways depending on the speakers’
-emotional states or accents (referred to as the one-to-many
-problem of text to speech generation). As a result, an advanced
-voice synthesis system is required to account for this unpredictability. We propose an end-to-end context-aware Text-toSpeech (TTS) synthesis system that derives the conveyed emotion
-from text input and synthesises audio that focuses on emotions
-and speaker features for natural and expressive speech, integrating advanced natural language processing (NLP) and speech
-synthesis techniques for real-time applications. The proposed
-system has two core components: an emotion classifier and a
-speech synthesiser. The emotion classifier utilises a classification
-model to extract sentiment information from the input text.
-Leveraging a non-autoregressive neural TTS model, the speech
-synthesiser generates Mel-spectrograms by incorporating speaker
-and emotion embeddings derived from the classifier’s output. We
-employ a Generative Adversarial Network (GAN)-based vocoder
-to convert the Mel-spectrograms into audible waveforms. One of
-the key contributions lies in effectively incorporating emotional
-characteristics into TTS synthesis. Our system also showcases
-competitive inference time performance when benchmarked
-against the state-of-the-art TTS models, making it suitable for
-real-time accessibility applications.
-
-
-
-
Demo
-
-
Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Below, you can listen to audio samples from different TTS models.
-
-
-
-
Description
-
FastSpeech 2[1]
-
TEMOTTS[2]
-
Our Model
-
-
-
Bikes are fun to ride
-
-
-
-
-
-
-
-
-
-
-
-
Dreams can come true
-
-
-
-
-
-
-
-
-
-
-
-
Friends make life more fun
-
-
-
-
-
-
-
-
-
-
-
-
-
Emotion Aware Samples
-
-
-
-
Description
-
FastSpeech 2[1]
-
TEMOTTS[2]
-
Our Model
-
-
-
Blowing out birthday candles makes me feel special!
-
-
-
-
-
-
-
-
-
-
-
-
Her heart felt heavy with sorrow
-
-
-
-
-
-
-
-
-
-
-
-
I am feeling sad
-
-
-
-
-
-
-
-
-
-
-
-
-
References
-
-
-
-
-
Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,” in International Conference on Learning Representations, 2021.
-
Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman, “TEMOTTS: Text-aware Emotional Text-to-Speech with no labels”, Speech & Machine Learning Lab, The University of Texas at Dallas, TX, USA, 2024.
-
-
-
-
-
-
-
diff --git a/_site/samples/fastspeech/Bikes_are_fun_to_ride__gen.wav b/_site/samples/fastspeech/Bikes_are_fun_to_ride__gen.wav
deleted file mode 100644
index 940f640..0000000
Binary files a/_site/samples/fastspeech/Bikes_are_fun_to_ride__gen.wav and /dev/null differ
diff --git a/_site/samples/fastspeech/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav b/_site/samples/fastspeech/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav
deleted file mode 100644
index 5ec1998..0000000
Binary files a/_site/samples/fastspeech/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav and /dev/null differ
diff --git a/_site/samples/fastspeech/Dreams_can_come_true__gen.wav b/_site/samples/fastspeech/Dreams_can_come_true__gen.wav
deleted file mode 100644
index 260efb7..0000000
Binary files a/_site/samples/fastspeech/Dreams_can_come_true__gen.wav and /dev/null differ
diff --git a/_site/samples/fastspeech/Friends_make_life_more_fun__gen.wav b/_site/samples/fastspeech/Friends_make_life_more_fun__gen.wav
deleted file mode 100644
index d2cb8c3..0000000
Binary files a/_site/samples/fastspeech/Friends_make_life_more_fun__gen.wav and /dev/null differ
diff --git a/_site/samples/fastspeech/Her_heart_felt_heavy_with_sorrow__gen.wav b/_site/samples/fastspeech/Her_heart_felt_heavy_with_sorrow__gen.wav
deleted file mode 100644
index 2a0692b..0000000
Binary files a/_site/samples/fastspeech/Her_heart_felt_heavy_with_sorrow__gen.wav and /dev/null differ
diff --git a/_site/samples/fastspeech/I_am_feeling_sad__gen.wav b/_site/samples/fastspeech/I_am_feeling_sad__gen.wav
deleted file mode 100644
index f11eac3..0000000
Binary files a/_site/samples/fastspeech/I_am_feeling_sad__gen.wav and /dev/null differ
diff --git a/_site/samples/ours/Bikes are fun to ride..wav b/_site/samples/ours/Bikes are fun to ride..wav
deleted file mode 100644
index a9dce57..0000000
Binary files a/_site/samples/ours/Bikes are fun to ride..wav and /dev/null differ
diff --git a/_site/samples/ours/Blowing out birthday candles makes me feel special!.wav b/_site/samples/ours/Blowing out birthday candles makes me feel special!.wav
deleted file mode 100644
index 4082a88..0000000
Binary files a/_site/samples/ours/Blowing out birthday candles makes me feel special!.wav and /dev/null differ
diff --git a/_site/samples/ours/Dreams can come true.wav b/_site/samples/ours/Dreams can come true.wav
deleted file mode 100644
index 7a663c5..0000000
Binary files a/_site/samples/ours/Dreams can come true.wav and /dev/null differ
diff --git a/_site/samples/ours/Friends make life more fun.wav b/_site/samples/ours/Friends make life more fun.wav
deleted file mode 100644
index fdb08a1..0000000
Binary files a/_site/samples/ours/Friends make life more fun.wav and /dev/null differ
diff --git a/_site/samples/ours/Her heart felt heavy with sorrow.wav b/_site/samples/ours/Her heart felt heavy with sorrow.wav
deleted file mode 100644
index 87fb160..0000000
Binary files a/_site/samples/ours/Her heart felt heavy with sorrow.wav and /dev/null differ
diff --git a/_site/samples/ours/I am feeling sad.wav b/_site/samples/ours/I am feeling sad.wav
deleted file mode 100644
index a96f004..0000000
Binary files a/_site/samples/ours/I am feeling sad.wav and /dev/null differ
diff --git a/_site/samples/temotts/Bikes_are_fun_to_ride__gen.wav b/_site/samples/temotts/Bikes_are_fun_to_ride__gen.wav
deleted file mode 100644
index 02793e4..0000000
Binary files a/_site/samples/temotts/Bikes_are_fun_to_ride__gen.wav and /dev/null differ
diff --git a/_site/samples/temotts/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav b/_site/samples/temotts/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav
deleted file mode 100644
index b38d953..0000000
Binary files a/_site/samples/temotts/Blowing_out_birthday_candles_makes_me_feel_special__gen.wav and /dev/null differ
diff --git a/_site/samples/temotts/Dreams_can_come_true__gen.wav b/_site/samples/temotts/Dreams_can_come_true__gen.wav
deleted file mode 100644
index 7105f42..0000000
Binary files a/_site/samples/temotts/Dreams_can_come_true__gen.wav and /dev/null differ
diff --git a/_site/samples/temotts/Friends_make_life_more_fun__gen.wav b/_site/samples/temotts/Friends_make_life_more_fun__gen.wav
deleted file mode 100644
index 2128eca..0000000
Binary files a/_site/samples/temotts/Friends_make_life_more_fun__gen.wav and /dev/null differ
diff --git a/_site/samples/temotts/Her_heart_felt_heavy_with_sorrow__gen.wav b/_site/samples/temotts/Her_heart_felt_heavy_with_sorrow__gen.wav
deleted file mode 100644
index 6d49a61..0000000
Binary files a/_site/samples/temotts/Her_heart_felt_heavy_with_sorrow__gen.wav and /dev/null differ
diff --git a/_site/samples/temotts/I_am_feeling_sad__gen.wav b/_site/samples/temotts/I_am_feeling_sad__gen.wav
deleted file mode 100644
index c9b7aec..0000000
Binary files a/_site/samples/temotts/I_am_feeling_sad__gen.wav and /dev/null differ
diff --git a/index.md b/index.md
index 356fd40..4b786a3 100644
--- a/index.md
+++ b/index.md
@@ -52,7 +52,7 @@ real-time accessibility applications.
## Demo
-Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Below, you can listen to audiohttps://github.com/ionut-cmd/Emotion-Aware-TTS/tree/main/FastSpeech2_Text_Aware_Emotion_TTS samples from different TTS models.
+Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Below, you can listen to audiohttps://raw.githubusercontent.com/ionut-cmd/Emotion-Aware-TTS/main/FastSpeech2_Text_Aware_Emotion_TTS samples from different TTS models.
@@ -65,19 +65,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be
Bikes are fun to ride
@@ -86,19 +86,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be
Dreams can come true
@@ -107,19 +107,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be
Friends make life more fun
@@ -139,19 +139,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be
Blowing out birthday candles makes me feel special!
@@ -160,19 +160,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be
Her heart felt heavy with sorrow
@@ -181,19 +181,19 @@ Welcome to the demonstration page of our Emotion-Aware Text-to-Speech Models. Be