This repository provides an official trial version of the Zundamon Speech WebUI. It allows users to try Zundamon's TTS audio generation and explore its capabilities.
Official website: https://zunko.jp/
For users who may face difficulties during installation, the following tutorial videos provide step-by-step guidance:
By following these videos, users can avoid common issues related to folder structure and setup.
This project is based on GPT-SoVITS and has been adapted and fine-tuned for Zundamon's voice synthesis. The WebUI for inference is built using Streamlit, providing a user-friendly interface for generating Zundamon's speech audio files.
- User-Friendly Web Interface: Easily upload reference audio and text, input your target text, and generate Zundamon's voice in your desired language.
- Custom Models: Fine-tuned models specifically for Zundamon are included to provide high-quality voice synthesis.
- Reference Files: Sample reference audio and text for Zundamon are provided in the
reference
folder. - Download Support: Generated audio files can be downloaded directly from the interface.
- Multilingual Support: Choose from multiple languages for both reference and target text.
Before starting, ensure you have the required dependencies installed:
pip install -r requirements.txt
After installing the dependencies, please install PyTorch manually from the official website.
The following PyTorch version has been tested and verified to work successfully:
- PyTorch:
2.1.2
- CUDA:
12.1
- Python:
3.9
You can install it using the following command:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
For other installation options, please visit the official PyTorch website.
To ensure all required submodules are initialized properly, use the following command:
git clone --recursive https://github.com/zunzun999/zundamon-speech-webui.git
cd zundamon-speech-webui
If you have already cloned the repository without the --recursive
flag, run:
git submodule update --init --recursive
- Download GPT-SoVITS Pretrained Models: Place the pretrained models in the
GPT-SoVITS/GPT_SoVITS/pretrained_models
folder:-
Use the following commands to download and set up the models:
git lfs install
git clone https://huggingface.co/lj1995/GPT-SoVITS
-
- Download G2PW Models: Download and unzip the G2PW models from G2PWModel_1.1.zip, rename the folder to
G2PWModel
, and place it inGPT-SoVITS/GPT_SoVITS/text
. - Download Zundamon Fine-Tuned Model:
Download the fine-tuned models for Zundamon and place them in the
zundamon-speech-webui/GPT-SoVITS
folder:-
Fine-tuned models include
GPT_weights_v2
andSoVITS_weights_v2
. -
Use the following commands to download and set up the models:
git clone https://huggingface.co/zunzunpj/zundamon_GPT-SoVITS
-
- Download and Install FFmpeg
- Download and place ffmpeg.exe and ffprobe.exe.
- Place them in the root directory of
zundamon-speech-webui/GPT-SoVITS
.
- Install Visual Studio Build Tools
- Visit the Visual Studio Download Page.
- Download and install "Visual Studio Build Tools".
- During installation, select "Desktop development with C++".
- Install CMake
- Visit the CMake Official Site.
- Download and install the Windows version of CMake.
- During installation, choose "Add CMake to the system PATH".
If you encounter an error like:
An error occurred during inference:
Resource averaged_perceptron_tagger_eng not found.
Try running the following commands in your project environment:
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')
This will ensure the necessary NLTK resources are downloaded.
-
Navigate to the project directory:
cd zundamon-speech-webui
-
Run the WebUI using Streamlit:
python zundamon_speech_run.py
-
Open the WebUI in your browser (URL will be displayed in the terminal).
-
Upload Reference Files
- Step 1: Reference Audio File Upload a sample audio file for Zundamon's voice (
.wav
). - Step 2: Reference Text Provide or upload a text file that corresponds to the reference audio file.
- Step 1: Reference Audio File Upload a sample audio file for Zundamon's voice (
-
Input Target Details
- Step 3: Target Text Enter the text you want to synthesize in Zundamon's voice.
- Step 4: Language Selection Select the language for both reference and target text.
-
Generate Audio
Click the Generate Speech button. After processing, the synthesized audio will be displayed with options to preview and download.
A sample reference audio file and corresponding text for Zundamon are included in the reference
folder. Feel free to use them to test the WebUI.
For additional Zundamon voice resources, visit the official download page:
https://zunko.jp/multimodal_dev/login.php
This software includes the following open-source software:
- GPT-SoVITS (MIT License)
- GPT-SoVITS Pretrained Models (MIT License)
- G2PW Model (Apache 2.0 License)
- UVR5 (Voice Cleaning) (MIT License)
- Faster Whisper Large V3 (MIT License)
These are provided under their respective license terms.
The license for the Zundamon Voice model is as follows: https://zunko.jp/con_ongen_kiyaku.html