Skip to content

Repository for the paper: Generating gender-ambiguous voices for privacy-preserving speech recognition

License

Notifications You must be signed in to change notification settings

dimitriStoidis/GenGAN

Repository files navigation

Generating gender-ambiguous voices for privacy-preserving speech recognition.

Gender-ambiguous speech synthesis is our proposed method for privacy-preserving speech recognition. GenGAN is a generative adversarial network that synthesises mel-spectrograms that are able to convey the content information in speech and conceal gender and identity information. We provide our pre-trained GenGAN synthesiser and our pre-trained model for gender recognition.

GenGAN pipeline

Installation

Requirements

  • Python >= 3.7.4
  • PyTorch >= 1.2.0
  • hydra-core >= 1.1.1
  • Soundfile >= 0.10.3
  • librosa >= 0.7.2

Download data

We use the clean-100 partition of the LibriSpeech dataset.

Instructions

  1. Clone repository
    git clone https://github.com/dimitriStoidis/GenGAN.git

  2. From a terminal or an Anaconda Prompt, go to project's root directory and run:
    conda create gengan
    conda activate gengan
    and install the required packages

  3. Download and add to path:
    the multi-speaker pre-trained MelGAN vocoder model.

For training:
4. Create the json manifests to read the data in /data_files folder

  • speaker and gender labels
  • path-to-audio file

Training example

To train the model run:
python train.py --trial model1 --epochs 25 --batch_size 25

Demo

To try-out GenGAN on your audio samples:

  1. Clone the repository

  2. Load the pre-trained GenGAN model from the checkpoint:
    /models/netG_epoch_25.pt folder for speech synthesis.

  3. Download and add to path:
    the multi-speaker pre-trained MelGAN vocoder model.

  4. Run:
    python demo.py --path_to_audio ./audio/xyz.wav --path_to_models ./models
    The output is a .wav file saved in /audio_ directory.

Evaluation

Gender Recognition

To perform gender recognition on your saved samples:

  1. Load the pre-trained model from the checkpopint:
    /models/model.ckpt-90_GenderNet.pt

  2. Run:
    python GenderNet.py --batch_size bs --set test

Speaker Verification

We use the pre-trained SpeakerNet model from here SpeakerNet to perform the speaker verification task.

Automatic Speech Recognition

Download QuartzNet model from: NeMo

References

The work is based on:

Cite

  @misc{https://doi.org/10.48550/arxiv.2207.01052,
  doi = {10.48550/ARXIV.2207.01052},
  url = {https://arxiv.org/abs/2207.01052},
  author = {Stoidis, Dimitrios and Cavallaro, Andrea},
  title = {Generating gender-ambiguous voices for privacy-preserving speech recognition},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Accepted for publication at Interspeech.

Contact

For any enquiries contact [email protected].

Licence

This work is licensed under the MIT License.

About

Repository for the paper: Generating gender-ambiguous voices for privacy-preserving speech recognition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages