🎙️ Hindi Text-to-Speech (TTS) Fine-Tuned Model 🇮🇳

Transforming text into speech has revolutionized how we interact with machines, making technologies more accessible and inclusive for people across the globe. This project focuses on fine-tuning a Hindi Text-to-Speech (TTS) model using Microsoft's SpeechT5 framework to generate high-quality, natural-sounding speech for Hindi texts. By leveraging advanced deep learning and transformer-based architectures, this model ensures precise articulation of words, even for complex pronunciations in Hindi.

The need for such models arises due to the limited availability of Hindi-specific speech models despite the language's widespread use. Fine-tuning a pre-trained model not only optimizes performance for the Hindi language but also enhances usability in diverse real-world applications like voice assistants, audiobook generation, and interactive educational tools. 🎯

This repository provides everything you need to replicate the project, evaluate the model, and utilize it for your specific use cases — from detailed instructions, code examples, and usage guidelines to sample audio outputs showcasing the model's capabilities.

📋 Project Overview

This project involves the fine-tuning of Microsoft’s SpeechT5 model for generating natural-sounding speech in Hindi, one of the most spoken languages in the world. By utilizing a dataset containing Hindi text-audio pairs, we have trained the model to convert input Hindi text into realistic and expressive speech.

Implementation Steps:

Dataset Preparation: Processing Hindi text and audio files to create training pairs.
Preprocessing: Tokenizing text and extracting audio features (mel spectrograms).
Fine-Tuning: Training SpeechT5 on the prepared dataset for Hindi-specific TTS.
Inference & Testing: Generating speech from Hindi text inputs and evaluating outputs.
Optimization: Implementing inference optimization techniques for faster speech generation.

What You’ll Find in This Repository:

📂 Source Code for Fine-Tuning
🎧 Audio Samples Generated by the Model
🛠️ Usage Instructions with Examples
📊 Evaluation Results & Insights

🛠️ Features

Accurate Pronunciation: Fine-tuned to handle complex phonetics and Hindi-specific nuances.
Natural Speech: Produces clear and lifelike speech outputs.
Flexible Usage: Easily integrated into applications for real-time TTS.
Customizable: You can further fine-tune or optimize the model for specific tasks.

📚 Applications 🌟

Voice Assistants 🤖
- Creating Hindi-speaking AI assistants like Alexa, Google Assistant, etc.
Educational Tools 📖
- Developing learning tools for regional students and visually impaired individuals.
Audiobooks & Podcasts 🎧
- Generating Hindi audiobooks and content for entertainment or education.
Content Localization 🌏
- Localizing advertisements, videos, and digital platforms for Hindi-speaking audiences.
Accessibility Tools ♿
- Providing speech solutions for text accessibility.

🔧 Setup & Installation

Follow these steps to use the model:

1. Clone the Repository

git lfs install
git clone https://huggingface.co/Saurabh1207/Hindi_SpeechT5_finetuned

2. Install Dependencies

Make sure you have Python and the necessary libraries installed.
Run the following:

pip install git+https://github.com/huggingface/transformers.git accelerate datasets soundfile speechbrain torch

3. Inference Code: Generate Hindi Speech

Here’s an example of how you can use the model to generate speech from Hindi text:

import os
import torch
from IPython.display import Audio
import soundfile as sf
from speechbrain.pretrained import EncoderClassifier
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan

# Load a sample from the dataset for speaker embedding
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
spk_model_name = "speechbrain/spkrec-xvect-voxceleb"
device = "cuda" if torch.cuda.is_available() else "cpu"
speaker_model = EncoderClassifier.from_hparams(source=spk_model_name, run_opts={"device": device}, savedir=os.path.join("/tmp", spk_model_name))

try:
    dataset = load_dataset("mozilla-foundation/common_voice_17_0", "hi", split="validated", trust_remote_code=True)
    dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
    sample = dataset[0]
    speaker_embedding = create_speaker_embedding(sample['audio']['array'])

except Exception as e:
    print(f"Error loading dataset: {e}")
    # Use a random speaker embedding as fallback
    speaker_embedding = torch.randn(1, 512)

def create_speaker_embedding(waveform):
    with torch.no_grad():
        speaker_embeddings = speaker_model.encode_batch(torch.tensor(waveform))
        speaker_embeddings = torch.nn.functional.normalize(speaker_embeddings, dim=2) 
        speaker_embeddings = speaker_embeddings.squeeze().cpu().numpy()
    return speaker_embeddings

# Load processor and fine-tuned model
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("Saurabh1207/Hindi_SpeechT5_finetuned")

# Define input text in Hindi
input_text = "नमस्ते, यह हिंदी टेक्स्ट टू स्पीच मॉडल का परीक्षण है।"

# Preprocess text
inputs = processor(text=input_text, return_tensors="pt")

# Generate speech
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
Audio(speech.numpy(), rate=16000)

# Save audio file
sf.write("hindi_output.wav", speech.cpu().numpy(), 16000)
print("Hindi speech generated and saved as 'hindi_output.wav'")

🎧 Sample Outputs

Here are some audio samples generated by the Hindi TTS model:

Sample 1: Play or Download
Sample 2: Play or Download
Sample 3: Play or Download

🚀 How to Fine-Tune Further?

For advanced users, fine-tuning the model on a custom dataset is simple:

Prepare a dataset with Hindi text and corresponding audio files.
Use the Hugging Face SpeechT5ForTextToSpeech API to fine-tune the model further.
Save and test the optimized model for improved results.

Refer to Hugging Face's documentation for details: SpeechT5 Fine-Tuning Guide.

🌐 Documentation Links

Explore the official resources for models and libraries used in this project:

SpeechT5 Overview: Microsoft SpeechT5 on Hugging Face
Transformers Library: Hugging Face Transformers
PyTorch: PyTorch Documentation

📊 Performance Metrics

The fine-tuned Hindi TTS model achieves high performance in subjective and objective evaluations, focusing on:

Pronunciation accuracy
Speech naturalness
Inference speed

Results demonstrate significant improvements over pre-trained models on Hindi datasets.

💡 Future Improvements

Expand the dataset for better coverage of accents and dialects.
Integrate quantization for faster real-time inference.
Optimize the model for deployment on low-resource devices.

🤝 Contributing

Contributions are welcome!
If you find any issues or have suggestions, feel free to open an issue or pull request.

🧑‍💻 Contact & Support

For queries or support, please reach out:

Email: [email protected]
LinkedIn: Saurabh's LinkedIn Profile

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Bark_Hindi		Bark_Hindi
Fine_Tuned_Hindi_Audio		Fine_Tuned_Hindi_Audio
Technical_Terms_Audio		Technical_Terms_Audio
Final_Report_Delieverable.pdf		Final_Report_Delieverable.pdf
Hindi_Finetuned.ipynb		Hindi_Finetuned.ipynb
README.md		README.md
SpeechT5_English_Technical.ipynb		SpeechT5_English_Technical.ipynb
Task_1_Delieverable.pdf		Task_1_Delieverable.pdf
Task_2_Delieverable.pdf		Task_2_Delieverable.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Hindi Text-to-Speech (TTS) Fine-Tuned Model 🇮🇳

📋 Project Overview

Implementation Steps:

What You’ll Find in This Repository:

🛠️ Features

📚 Applications 🌟

🔧 Setup & Installation

1. Clone the Repository

2. Install Dependencies

3. Inference Code: Generate Hindi Speech

🎧 Sample Outputs

🚀 How to Fine-Tune Further?

🌐 Documentation Links

📊 Performance Metrics

💡 Future Improvements

🤝 Contributing

🧑‍💻 Contact & Support

⭐ If you like this project, don’t forget to give it a star! ⭐

About

Releases

Packages

Languages

Saurabh-Kumar-0/Text_To_Speech_Model_For_Regional_Language

Folders and files

Latest commit

History

Repository files navigation

🎙️ Hindi Text-to-Speech (TTS) Fine-Tuned Model 🇮🇳

📋 Project Overview

Implementation Steps:

What You’ll Find in This Repository:

🛠️ Features

📚 Applications 🌟

🔧 Setup & Installation

1. Clone the Repository

2. Install Dependencies

3. Inference Code: Generate Hindi Speech

🎧 Sample Outputs

🚀 How to Fine-Tune Further?

🌐 Documentation Links

📊 Performance Metrics

💡 Future Improvements

🤝 Contributing

🧑‍💻 Contact & Support

⭐ If you like this project, don’t forget to give it a star! ⭐

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages