Generating gender-ambiguous voices for privacy-preserving speech recognition.

Gender-ambiguous speech synthesis is our proposed method for privacy-preserving speech recognition. GenGAN is a generative adversarial network that synthesises mel-spectrograms that are able to convey the content information in speech and conceal gender and identity information. We provide our pre-trained GenGAN synthesiser and our pre-trained model for gender recognition.

Installation

Requirements

Python >= 3.7.4
PyTorch >= 1.2.0
hydra-core >= 1.1.1
Soundfile >= 0.10.3
librosa >= 0.7.2

Download data

We use the clean-100 partition of the LibriSpeech dataset.

Instructions

Clone repository
git clone https://github.com/dimitriStoidis/GenGAN.git
From a terminal or an Anaconda Prompt, go to project's root directory and run:
conda create gengan
conda activate gengan
and install the required packages
Download and add to path:
the multi-speaker pre-trained MelGAN vocoder model.

For training:
4. Create the json manifests to read the data in /data_files folder

speaker and gender labels
path-to-audio file

Training example

To train the model run:
python train.py --trial model1 --epochs 25 --batch_size 25

Demo

To try-out GenGAN on your audio samples:

Clone the repository
Load the pre-trained GenGAN model from the checkpoint:
/models/netG_epoch_25.pt folder for speech synthesis.
Download and add to path:
the multi-speaker pre-trained MelGAN vocoder model.
Run:
python demo.py --path_to_audio ./audio/xyz.wav --path_to_models ./models
The output is a .wav file saved in /audio_ directory.

Evaluation

Gender Recognition

To perform gender recognition on your saved samples:

Load the pre-trained model from the checkpopint:
/models/model.ckpt-90_GenderNet.pt
Run:
python GenderNet.py --batch_size bs --set test

Speaker Verification

We use the pre-trained SpeakerNet model from here SpeakerNet to perform the speaker verification task.

Automatic Speech Recognition

Download QuartzNet model from: NeMo

References

The work is based on:

Cite

  @misc{https://doi.org/10.48550/arxiv.2207.01052,
  doi = {10.48550/ARXIV.2207.01052},
  url = {https://arxiv.org/abs/2207.01052},
  author = {Stoidis, Dimitrios and Cavallaro, Andrea},
  title = {Generating gender-ambiguous voices for privacy-preserving speech recognition},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Accepted for publication at Interspeech.

Contact

For any enquiries contact [email protected].

Licence

This work is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
automatic_speech_recognition		automatic_speech_recognition
data_files		data_files
models		models
GenderNet.py		GenderNet.py
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
demo.py		demo.py
gengan_pipeline.jpg		gengan_pipeline.jpg
modules.py		modules.py
networks.py		networks.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating gender-ambiguous voices for privacy-preserving speech recognition.

Installation

Requirements

Download data

Instructions

Training example

Demo

Evaluation

Gender Recognition

Speaker Verification

Automatic Speech Recognition

References

Cite

Contact

Licence

About

Releases

Languages

License

dimitriStoidis/GenGAN

Folders and files

Latest commit

History

Repository files navigation

Generating gender-ambiguous voices for privacy-preserving speech recognition.

Installation

Requirements

Download data

Instructions

Training example

Demo

Evaluation

Gender Recognition

Speaker Verification

Automatic Speech Recognition

References

Cite

Contact

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages