Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (…#1
Merged
sufen-f merged 2 commits intosufen-f:maebfrom Feb 28, 2025
Merged
Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (…#1sufen-f merged 2 commits intosufen-f:maebfrom
sufen-f merged 2 commits intosufen-f:maebfrom
Conversation
…2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…embeddings-benchmark#2175)
Added wav2vec model wrapper
Added four w2v variants
Update wav2vec_models.py
Removed run.py test script
Added subTask with small sample of dataset for testing
Removed test portion of VoiceGender.py task
add commit hash and bibtex
make lint
update models
fix circular import
make VoiceGender discoverable in get_tasks
add a2a as category for clustering
specify latest commit hash
revert linting changes
Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file
Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype
Changed task from Voice Gender Clustering to Gender Clustering.
Fixed mock audio clustering tests
Added dataset metadata
Linted
Passed revision into the w2v2 loader
passed lint check
Linted
Update VoiceGender.py
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)