A. Firc, K. Malinka and P. Hanáček, "Diffuse or Confuse: A Diffusion Deepfake Speech Dataset," 2024 International Conference of the Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany, 2024, pp. 1-7, doi: 10.1109/BIOSIG61931.2024.10786752.
Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion dataset using available tools and pretrained models. Additionally, this study assesses the quality of diffusion-generated deepfakes versus non-diffusion ones and their potential threat to current deepfake detection systems. Findings indicate that the detection of diffusion-based deepfakes is generally comparable to non-diffusion deepfakes, with some variability based on detector architecture. Re-vocoding with diffusion vocoders shows minimal impact, and the overall speech quality is comparable to non-diffusion methods.
Download and extract the following .zip files into the same directory:
@INPROCEEDINGS{10786752,
author={Firc, Anton and Malinka, Kamil and Hanáček, Petr},
booktitle={2024 International Conference of the Biometrics Special Interest Group (BIOSIG)},
title={Diffuse or Confuse: A Diffusion Deepfake Speech Dataset},
year={2024},
volume={},
number={},
pages={1-7},
keywords={Biometrics;Deepfakes;Vocoders;Detectors;Diffusion models;Speech synthesis;deepfakes;deepfake speech;dataset;diffusion;detection},
doi={10.1109/BIOSIG61931.2024.10786752}
}