DEepfake CROss-lingual (DECRO) evaluation dataset

This repository provides the DECRO dataset mentioned in the paper: Transferring Audio Deepfake Detection Capability across Languages (accepted by TheWebConf 2023). DEepfake CROss-lingual evaluation dataset is constructed to evaluate the influence of language differences on deepfake detection.

The latest DECRO dataset is available at https://zenodo.org/record/7603208.

Citation

If you use DECRO dataset for deepfake detection, please use the following citation:

@inproceedings{ba2023transferring,
  title={Transferring Audio Deepfake Detection Capability across Languages},
  author={Ba, Zhongjie and Wen, Qing and Cheng, Peng and Wang, Yuwei and Lin, Feng and Lu, Li and Liu, Zhenguang},
  booktitle={Proceedings of the ACM Web Conference 2023},
  pages={2033--2044},
  year={2023}
}

Composition

DECRO consists of two subsets: English and Chinese subsets. The English and Chinese parts both contain bona-fide and spoofed speech samples, and have almost the same total audio length. Most importantly, the spoofed speech signals in the two parts are generated with the same types of synthetic algorithms, which helps to exclude other interference factors and benefits accurate measurement of the impact of language differences on the detection accuracy.

Composition of the Bona-fide Part

There are 21218 bona-fide utterances in the Chinese subset and 12484 bona-fide utterances in the English subset.

For the Chinese part, we collected data from multiple open-sourced recording datasets. Six recording datasets are covered to guarantee the diversity of Chinese recordings, including Aidatatang_200zh, Aishell1, Aishell3, freeST, MagicData, and Aishell2 (free for academic usage).

For the English part, bona-fide audios are collected from the ASVspoof2019 LA dataset and redivided to fit our setting.

Composition of the Spoofed Part

There are 41880 and 42800 spoofed utterances in the Chinese and the English dataset, respectively. Part of the spoofed speeches is from public datasets and the others are generated using commercial and open-source algorithms, including Text-to-speech (TTS) and Voice conversion (VC) techniques.

We collect samples from two public deepfake speech datasets: a Chinese dataset Wavefake and an English dataset FAD. The two datasets collect some speech samples generated by the same synthesis algorithms, including HiFiGAN, Multiband-MelGAN, and PWG.

Besides, Tacotron, FastSpeech2, VITS, and Starganv2-vc are end-to-end synthesis algorithms inherently supporting the generation of both Chinese and English. We collect Tacotron English data from A10 in the ASVspoof2019 LA dataset and generate the others with the corresponding pre-trained models. Note that NVCNet was initially proposed to perform VC in English. We retrain the model to generate Chinese speech. The Chinese samples of Baidu and Xunfei TTS come from the FMFCC-A dataset and we synthesize the corresponding English samples via online APIs. Noted, we refer to the above spoofing algorithms using their abbreviations in the latter sections.

Division

DECRO specifics, including the number of bona-fide and spoofed utterances.

	English			Chinese
	Train Set	Dev Set	Eval Set	Train Set	Dev Set	Eval Set
Bona-fide	5129	3049	4306	9000	6109	6109
Spoofed	17412	10503	14884	17850	12015	12015
Total	22541	13552	19190	26850	18124	18124

Data Format

The distributed audio files are encoded at a single channel, *.wav format. The corpus is split into train, dev, and eval subsets.

Protocol Format

All protocol files for the deepfake detection models are in ASCII format. Each column of the protocol is formatted as:

SPEAKER_ID AUDIO_FILE_NAME - SYSTEM_ID KEY

where,

SPEAKER_ID: ****, speaker ID
AUDIO_FILE_NAME: ****, the name of the audio file (!!!! no file extension, eg. "001" for "001.wav")
SYSTEM_ID: Abbreviation of the speech spoofing system, or, for bonafide speech SYSTEM-ID is left blank ('-')
-: This column is NOT used.
KEY: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Future Plan

2023.2.10 -- 2024.1.1 Chinese and English data generated by more spoofing algorithms will be included.

References

[1] Aidatatang_200zh

@online{openslrDatatang,
author = {DataTang},
title = {aidatatang\_200zh, a free Chinese Mandarin speech corpus by Beijing DataTang Technology Co., Ltd ( www.datatang.com )},
year = {2020},
howpublished = {\url{https://openslr.org/62/}},
note = {Online; accessed 08-Oct-2022}
}

[2] Aishell1

@inproceedings{bu2017aishell,
  title={Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline},
  author={Bu, Hui and Du, Jiayu and Na, Xingyu and Wu, Bengu and Zheng, Hao},
  booktitle={2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA)},
  pages={1--5},
  year={2017},
  organization={IEEE}
}

[3] Aishell3

@inproceedings{AISHELL-3_2020,
  title={AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines},
  author={Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, Ming Li},
  year={2015},
  url={https://arxiv.org/abs/2010.11567}
}

[4] freeST

@article{openslrFreeST,
author = {Surfing Technology Beijing Co., Ltd},
title = {ST-CMDS-20170001\_1, Free ST Chinese Mandarin Corpus},
year = {2018},
howpublished = {\url{http://www.openslr.org/38/}},
note = {Online; accessed 08-Oct-2022}
}

[5] MagicData

@article{openslrMagicdata,
author = {Magic Data Technology Co., Ltd},
title = {MAGICDATA Mandarin Chinese Read Speech Corpus},
year = {2019},
howpublished = {\url{http://www.openslr.org/68/}},
note = {Online; accessed 08-Oct-2022}
}

[6] WaveFake

@inproceedings{frank2021wavefake,
title={WaveFake: A Data Set to Facilitate Audio Deepfake Detection},
author={Joel Frank and Lea Sch{\"o}nherr},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
year={2021},
url={https://openreview.net/forum?id=74TZg9gsO8W}
}

[7] FAD

@article{ma2022fad,
  title={FAD: A Chinese Dataset for Fake Audio Detection},
  author={Ma, Haoxin and Yi, Jiangyan and Wang, Chenglong and Yan, Xinrui and Tao, Jianhua and Wang, Tao and Wang, Shiming and Xu, Le and Fu, Ruibo},
  journal={arXiv preprint arXiv:2207.12308},
  year={2022}
}

[8] GST-Tacotron

@article{GST-Tacotron,
author = {KinglittleQ},
title = {GST-Tacotron},
year = {2018},
howpublished = {\url{https://github.com/KinglittleQ/GST-Tacotron}},
note = {Online; accessed 09-Oct-2022}
}

[9] FastSpeech2

@INPROCEEDINGS{chien2021investigating,
  author={Chien, Chung-Ming and Lin, Jheng-Hao and Huang, Chien-yu and Hsu, Po-chun and Lee, Hung-yi},
  booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech}, 
  year={2021},
  volume={},
  number={},
  pages={8588-8592},
  doi={10.1109/ICASSP39728.2021.9413880}}

[10] VITS_chinese

@article{VITSch,
author = {UEhQZXI},
title = {vits\_chinese},
year = {2021},
howpublished = {\url{https://github.com/UEhQZXI/2021}},
note = {Online; accessed 09-Oct-2022}
}

[11] Starganv2-vc

@article{li2021starganv2,
  title={Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion},
  author={Li, Yinghao Aaron and Zare, Ali and Mesgarani, Nima},
  journal={arXiv preprint arXiv:2107.10394},
  year={2021}
}

[12] NVC-Net

@article{nguyen2021nvc,
  title={NVC-Net: End-to-End Adversarial Voice Conversion},
  author={Nguyen, Bac and Cardinaux, Fabien},
  journal={arXiv preprint arXiv:2106.00992},
  year={2021}
}

[13] FMFCC-A

@article{zhang2021FMFCCA,
  author    = {Zhenyu Zhang and Yewei Gu and Xiaowei Yi and Xianfeng Zhao},
  title     = {{FMFCC-A:} {A} Challenging Mandarin Dataset for Synthetic Speech Detection},
  journal   = {CoRR},
  volume    = {abs/2110.09441},
  year      = {2021},
  url       = {https://arxiv.org/abs/2110.09441},
  eprinttype = {arXiv},
  eprint    = {2110.09441},
  timestamp = {Mon, 25 Oct 2021 20:07:12 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2110-09441.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

[14] ASVspoof2019

@article{todisco2019asvspoof,
  title={{ASVspoof} 2019: Future Horizons in Spoofed and Fake Audio Detection},
  author={Todisco, Massimiliano and Wang, Xin and Vestman, Ville and Sahidullah, Md and Delgado, Hector and Nautsch, Andreas and Yamagishi, Junichi and Evans, Nicholas and Kinnunen, Tomi and Lee, Kong Aik},
  journal={arXiv preprint arXiv:1904.05441},
  year={2019}
}

LICENSE

CC-BY-4.0.

Contact

Qing Wen, Zhejiang University ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ch_dev.txt		ch_dev.txt
ch_eval.txt		ch_eval.txt
ch_train.txt		ch_train.txt
en_dev.txt		en_dev.txt
en_eval.txt		en_eval.txt
en_train.txt		en_train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEepfake CROss-lingual (DECRO) evaluation dataset

Citation

Composition

Composition of the Bona-fide Part

Composition of the Spoofed Part

Division

Data Format

Protocol Format

Future Plan

References

LICENSE

Contact

About

Releases

License

petrichorwq/DECRO-dataset

Folders and files

Latest commit

History

Repository files navigation

DEepfake CROss-lingual (DECRO) evaluation dataset

Citation

Composition

Composition of the Bona-fide Part

Composition of the Spoofed Part

Division

Data Format

Protocol Format

Future Plan

References

LICENSE

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases