Skip to content

Conversation

@david-ryan-snyder
Copy link
Contributor

@david-ryan-snyder david-ryan-snyder commented Jul 5, 2018

Mitchell McLaren of SRI informed me that there's an overlap between the speakers in the test portion of VoxCeleb2 (which is used in our training list) and the recipe's evaluation dataset, Speakers in the Wild.

This PR removes the VoxCeleb2 test set from the training list. There should now be no speaker overlap between the training dataset and the evaluation dataset.

This results in a loss of 166 training speakers. The i-vector results in v1 worsen, slightly. The x-vector results in v2 don't change much. After this PR is merged, the models in http://kaldi-asr.org/models/m8 will be updated.

@danpovey danpovey merged commit ad93210 into kaldi-asr:master Jul 5, 2018
dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants