Skip to content

Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.

License

Notifications You must be signed in to change notification settings

petrichorwq/DECRO-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEepfake CROss-lingual (DECRO) evaluation dataset

DOI

This repository provides the DECRO dataset mentioned in the paper: Transferring Audio Deepfake Detection Capability across Languages (accepted by TheWebConf 2023). DEepfake CROss-lingual evaluation dataset is constructed to evaluate the influence of language differences on deepfake detection.

The latest DECRO dataset is available at https://zenodo.org/record/7603208.

Citation

If you use DECRO dataset for deepfake detection, please use the following citation:

@inproceedings{ba2023transferring,
  title={Transferring Audio Deepfake Detection Capability across Languages},
  author={Ba, Zhongjie and Wen, Qing and Cheng, Peng and Wang, Yuwei and Lin, Feng and Lu, Li and Liu, Zhenguang},
  booktitle={Proceedings of the ACM Web Conference 2023},
  pages={2033--2044},
  year={2023}
}

Composition

DECRO consists of two subsets: English and Chinese subsets. The English and Chinese parts both contain bona-fide and spoofed speech samples, and have almost the same total audio length. Most importantly, the spoofed speech signals in the two parts are generated with the same types of synthetic algorithms, which helps to exclude other interference factors and benefits accurate measurement of the impact of language differences on the detection accuracy.

Composition of the Bona-fide Part

There are 21218 bona-fide utterances in the Chinese subset and 12484 bona-fide utterances in the English subset.

For the Chinese part, we collected data from multiple open-sourced recording datasets. Six recording datasets are covered to guarantee the diversity of Chinese recordings, including Aidatatang_200zh, Aishell1, Aishell3, freeST, MagicData, and Aishell2 (free for academic usage).

For the English part, bona-fide audios are collected from the ASVspoof2019 LA dataset and redivided to fit our setting.

Composition of the Spoofed Part

There are 41880 and 42800 spoofed utterances in the Chinese and the English dataset, respectively. Part of the spoofed speeches is from public datasets and the others are generated using commercial and open-source algorithms, including Text-to-speech (TTS) and Voice conversion (VC) techniques.

We collect samples from two public deepfake speech datasets: a Chinese dataset Wavefake and an English dataset FAD. The two datasets collect some speech samples generated by the same synthesis algorithms, including HiFiGAN, Multiband-MelGAN, and PWG.

Besides, Tacotron, FastSpeech2, VITS, and Starganv2-vc are end-to-end synthesis algorithms inherently supporting the generation of both Chinese and English. We collect Tacotron English data from A10 in the ASVspoof2019 LA dataset and generate the others with the corresponding pre-trained models. Note that NVCNet was initially proposed to perform VC in English. We retrain the model to generate Chinese speech. The Chinese samples of Baidu and Xunfei TTS come from the FMFCC-A dataset and we synthesize the corresponding English samples via online APIs. Noted, we refer to the above spoofing algorithms using their abbreviations in the latter sections.

Division

DECRO specifics, including the number of bona-fide and spoofed utterances.

English Chinese
Train Set Dev Set Eval Set Train Set Dev Set Eval Set
Bona-fide 5129 3049 4306 9000 6109 6109
Spoofed 17412 10503 14884 17850 12015 12015
Total 22541 13552 19190 26850 18124 18124

Data Format

The distributed audio files are encoded at a single channel, *.wav format. The corpus is split into train, dev, and eval subsets.

Protocol Format

All protocol files for the deepfake detection models are in ASCII format. Each column of the protocol is formatted as:

SPEAKER_ID AUDIO_FILE_NAME - SYSTEM_ID KEY

where,

  1. SPEAKER_ID: ****, speaker ID
  2. AUDIO_FILE_NAME: ****, the name of the audio file (!!!! no file extension, eg. "001" for "001.wav")
  3. SYSTEM_ID: Abbreviation of the speech spoofing system, or, for bonafide speech SYSTEM-ID is left blank ('-')
  4. -: This column is NOT used.
  5. KEY: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Future Plan

2023.2.10 -- 2024.1.1 Chinese and English data generated by more spoofing algorithms will be included.

References

[1] Aidatatang_200zh

@online{openslrDatatang,
author = {DataTang},
title = {aidatatang\_200zh, a free Chinese Mandarin speech corpus by Beijing DataTang Technology Co., Ltd ( www.datatang.com )},
year = {2020},
howpublished = {\url{https://openslr.org/62/}},
note = {Online; accessed 08-Oct-2022}
}

[2] Aishell1

@inproceedings{bu2017aishell,
  title={Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline},
  author={Bu, Hui and Du, Jiayu and Na, Xingyu and Wu, Bengu and Zheng, Hao},
  booktitle={2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA)},
  pages={1--5},
  year={2017},
  organization={IEEE}
}

[3] Aishell3

@inproceedings{AISHELL-3_2020,
  title={AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines},
  author={Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li},
  year={2015},
  url={https://arxiv.org/abs/2010.11567}
}

[4] freeST

@article{openslrFreeST,
author = {Surfing Technology Beijing Co., Ltd},
title = {ST-CMDS-20170001\_1, Free ST Chinese Mandarin Corpus},
year = {2018},
howpublished = {\url{http://www.openslr.org/38/}},
note = {Online; accessed 08-Oct-2022}
}

[5] MagicData

@article{openslrMagicdata,
author = {Magic Data Technology Co., Ltd},
title = {MAGICDATA Mandarin Chinese Read Speech Corpus},
year = {2019},
howpublished = {\url{http://www.openslr.org/68/}},
note = {Online; accessed 08-Oct-2022}
}

[6] WaveFake

@inproceedings{frank2021wavefake,
title={WaveFake: A Data Set to Facilitate Audio Deepfake Detection},
author={Joel Frank and Lea Sch{\"o}nherr},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
year={2021},
url={https://openreview.net/forum?id=74TZg9gsO8W}
}

[7] FAD

@article{ma2022fad,
  title={FAD: A Chinese Dataset for Fake Audio Detection},
  author={Ma, Haoxin and Yi, Jiangyan and Wang, Chenglong and Yan, Xinrui and Tao, Jianhua and Wang, Tao and Wang, Shiming and Xu, Le and Fu, Ruibo},
  journal={arXiv preprint arXiv:2207.12308},
  year={2022}
}

[8] GST-Tacotron

@article{GST-Tacotron,
author = {KinglittleQ},
title = {GST-Tacotron},
year = {2018},
howpublished = {\url{https://github.com/KinglittleQ/GST-Tacotron}},
note = {Online; accessed 09-Oct-2022}
}

[9] FastSpeech2

@INPROCEEDINGS{chien2021investigating,
  author={Chien, Chung-Ming and Lin, Jheng-Hao and Huang, Chien-yu and Hsu, Po-chun and Lee, Hung-yi},
  booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech}, 
  year={2021},
  volume={},
  number={},
  pages={8588-8592},
  doi={10.1109/ICASSP39728.2021.9413880}}

[10] VITS_chinese

@article{VITSch,
author = {UEhQZXI},
title = {vits\_chinese},
year = {2021},
howpublished = {\url{https://github.com/UEhQZXI/2021}},
note = {Online; accessed 09-Oct-2022}
}

[11] Starganv2-vc

@article{li2021starganv2,
  title={Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion},
  author={Li, Yinghao Aaron and Zare, Ali and Mesgarani, Nima},
  journal={arXiv preprint arXiv:2107.10394},
  year={2021}
}

[12] NVC-Net

@article{nguyen2021nvc,
  title={NVC-Net: End-to-End Adversarial Voice Conversion},
  author={Nguyen, Bac and Cardinaux, Fabien},
  journal={arXiv preprint arXiv:2106.00992},
  year={2021}
}

[13] FMFCC-A

@article{zhang2021FMFCCA,
  author    = {Zhenyu Zhang and Yewei Gu and Xiaowei Yi and Xianfeng Zhao},
  title     = {{FMFCC-A:} {A} Challenging Mandarin Dataset for Synthetic Speech Detection},
  journal   = {CoRR},
  volume    = {abs/2110.09441},
  year      = {2021},
  url       = {https://arxiv.org/abs/2110.09441},
  eprinttype = {arXiv},
  eprint    = {2110.09441},
  timestamp = {Mon, 25 Oct 2021 20:07:12 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2110-09441.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

[14] ASVspoof2019

@article{todisco2019asvspoof,
  title={{ASVspoof} 2019: Future Horizons in Spoofed and Fake Audio Detection},
  author={Todisco, Massimiliano and Wang, Xin and Vestman, Ville and Sahidullah, Md and Delgado, Hector and Nautsch, Andreas and Yamagishi, Junichi and Evans, Nicholas and Kinnunen, Tomi and Lee, Kong Aik},
  journal={arXiv preprint arXiv:1904.05441},
  year={2019}
}

LICENSE

CC-BY-4.0.

Contact

Qing Wen, Zhejiang University ([email protected])

About

Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published