Gender-ambiguous speech synthesis is our proposed method for privacy-preserving speech recognition. GenGAN is a generative adversarial network that synthesises mel-spectrograms that are able to convey the content information in speech and conceal gender and identity information. We provide our pre-trained GenGAN synthesiser and our pre-trained model for gender recognition.
- Python >= 3.7.4
- PyTorch >= 1.2.0
- hydra-core >= 1.1.1
- Soundfile >= 0.10.3
- librosa >= 0.7.2
We use the clean-100 partition of the LibriSpeech dataset.
-
Clone repository
git clone https://github.com/dimitriStoidis/GenGAN.git
-
From a terminal or an Anaconda Prompt, go to project's root directory and run:
conda create gengan
conda activate gengan
and install the required packages -
Download and add to path:
the multi-speaker pre-trained MelGAN vocoder model.
For training:
4. Create the json manifests to read the data in /data_files
folder
- speaker and gender labels
- path-to-audio file
To train the model run:
python train.py --trial model1 --epochs 25 --batch_size 25
To try-out GenGAN on your audio samples:
-
Clone the repository
-
Load the pre-trained GenGAN model from the checkpoint:
/models/netG_epoch_25.pt
folder for speech synthesis. -
Download and add to path:
the multi-speaker pre-trained MelGAN vocoder model. -
Run:
python demo.py --path_to_audio ./audio/xyz.wav --path_to_models ./models
The output is a.wav
file saved in/audio_
directory.
To perform gender recognition on your saved samples:
-
Load the pre-trained model from the checkpopint:
/models/model.ckpt-90_GenderNet.pt
-
Run:
python GenderNet.py --batch_size bs --set test
We use the pre-trained SpeakerNet model from here SpeakerNet to perform the speaker verification task.
Download QuartzNet model from: NeMo
The work is based on:
@misc{https://doi.org/10.48550/arxiv.2207.01052,
doi = {10.48550/ARXIV.2207.01052},
url = {https://arxiv.org/abs/2207.01052},
author = {Stoidis, Dimitrios and Cavallaro, Andrea},
title = {Generating gender-ambiguous voices for privacy-preserving speech recognition},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
Accepted for publication at Interspeech.
For any enquiries contact [email protected].
This work is licensed under the MIT License.