[egs] Fix to training list in egs/sitw{v1,v2} recipe #2535
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mitchell McLaren of SRI informed me that there's an overlap between the speakers in the test portion of VoxCeleb2 (which is used in our training list) and the recipe's evaluation dataset, Speakers in the Wild.
This PR removes the VoxCeleb2 test set from the training list. There should now be no speaker overlap between the training dataset and the evaluation dataset.
This results in a loss of 166 training speakers. The i-vector results in v1 worsen, slightly. The x-vector results in v2 don't change much. After this PR is merged, the models in http://kaldi-asr.org/models/m8 will be updated.