Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset by sufen-f · Pull Request #2175 · embeddings-benchmark/mteb

sufen-f · 2025-02-27T07:17:06Z

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Command

import mteb

model_name = "facebook/wav2vec2-base"
model = mteb.get_model(model_name, model_revision="0b5b8e868dd84f03fd87d01f9c4ff0f080fecfe8")
tasks = mteb.get_tasks(tasks=["VoiceGenderClustering"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")

sufen-f · 2025-02-27T07:17:51Z

@alisartazkhan @mnasser3

isaac-chung

Thanks for separating this out! I think we're close. A few comments here. Also, please make sure the tests pass.

Based on this comment, it seemed like the newly added task and model can be run. Please also share the command / script used in the PR description, like so:

import mteb
#example code here

mteb/models/wav2vec_models.py

mteb/models/overview.py

mteb/models/wav2vec_models.py

mteb/tasks/Audio/Clustering/eng/VoiceGender.py

…dio to .toml file

…ortened task_subtype

mteb/tasks/Audio/Clustering/eng/VoiceGender.py

…odels

alisartazkhan · 2025-02-28T08:18:11Z

Does anyone know why we are failing the lint test? When I run make lint-check, it says All checks passed!. But, it seems to fail it here.

isaac-chung · 2025-02-28T08:20:40Z

@alisartazkhan try make lint, which should update the files, then commit those changes.

alisartazkhan · 2025-02-28T08:22:43Z

@isaac-chung I tried both make lint`` and make lint-check``` and seems like we pass all checks for both instances.

isaac-chung · 2025-02-28T08:24:34Z

@alisartazkhan what ruff version are you using? This branch seems to be using ruff==0.6.4

alisartazkhan · 2025-02-28T08:29:28Z

I see. I'm using ruff== 0.9.8. Sufen's latest commit seem to have done the trick.

isaac-chung · 2025-02-28T08:31:06Z

@alisartazkhan the maeb branch is a bit behind the main branch, which uses ruff=0.9.7. This should be fixed when we update maeb.

isaac-chung

Nice work team! Just one final small thing. The alternative is just to specify the "train" split.

mteb/tasks/Audio/Clustering/eng/VoiceGender.py

alisartazkhan · 2025-02-28T08:39:03Z

I just made the final adjustment. Let me know if there's anything else. Thanks for the continuous support @isaac-chung and @Samoed !

isaac-chung · 2025-02-28T08:40:24Z

Looks good, thanks for iterating! I'll enable auto-merge now.

Samoed

Great!

mteb/abstasks/TaskMetadata.py

alisartazkhan and others added 4 commits February 26, 2025 22:48

Added wav2vec model wrapper

ea4651a

Added four w2v variants

557460a

Update wav2vec_models.py

401debb

Removed run.py test script

3e80108

sufen-f requested a review from isaac-chung February 27, 2025 07:17

sufen-f changed the title ~~Models - added Wav2Vec model~~ Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset Feb 27, 2025

mn and others added 2 commits February 26, 2025 23:26

Added subTask with small sample of dataset for testing

6614c51

Removed test portion of VoiceGender.py task

b471057

sufen-f mentioned this pull request Feb 27, 2025

Maeb - added voice clustering task, wav2vec model and VoxCeleb dataset (subset) #2136

Merged

12 tasks

isaac-chung added 8 commits February 27, 2025 14:17

add commit hash and bibtex

cd55c46

make lint

2ecba04

update models

6109f87

fix circular import

880bcbe

make VoiceGender discoverable in get_tasks

af38fe4

add a2a as category for clustering

3ce93be

specify latest commit hash

a378ec0

revert linting changes

5fe8087

isaac-chung reviewed Feb 27, 2025

View reviewed changes

mteb/models/wav2vec_models.py Show resolved Hide resolved

mteb/models/overview.py Outdated Show resolved Hide resolved

mteb/models/wav2vec_models.py Outdated Show resolved Hide resolved

Samoed reviewed Feb 27, 2025

View reviewed changes

mteb/tasks/Audio/Clustering/eng/VoiceGender.py Show resolved Hide resolved

mteb/tasks/Audio/Clustering/eng/VoiceGender.py Outdated Show resolved Hide resolved

mteb/tasks/Audio/Clustering/eng/VoiceGender.py Outdated Show resolved Hide resolved

alisartazkhan and others added 3 commits February 27, 2025 18:49

Based on feedback for model: updated w2v2 revisions and added torchau…

af3de65

…dio to .toml file

Added Bibtex for dataset, set data to be test instead of training, sh…

0d861db

…ortened task_subtype

Changed task from Voice Gender Clustering to Gender Clustering.

144ec83

Samoed reviewed Feb 28, 2025

View reviewed changes

mteb/tasks/Audio/Clustering/eng/VoiceGender.py Outdated Show resolved Hide resolved

sufen-f and others added 6 commits February 27, 2025 22:38

Fixed mock audio clustering tests

28b6c6b

Added dataset metadata

ecde41d

Linted

743f832

Passed revision into the w2v2 loader

298caa5

Merge branch 'models' of https://github.com/sufen-f/mteb_audio into m…

398d024

…odels

passed lint check

22fd001

Linted

6b2cb71

isaac-chung approved these changes Feb 28, 2025

View reviewed changes

mteb/tasks/Audio/Clustering/eng/VoiceGender.py Outdated Show resolved Hide resolved

Update VoiceGender.py

3f23a87

Samoed approved these changes Feb 28, 2025

View reviewed changes

mteb/abstasks/TaskMetadata.py Show resolved Hide resolved

isaac-chung merged commit 1302477 into embeddings-benchmark:maeb Feb 28, 2025
9 checks passed

sufen-f mentioned this pull request Feb 28, 2025

Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset (… sufen-f/mteb_audio#1

Merged

17 tasks

kkaitlyn111 mentioned this pull request May 9, 2025

MAEB Overview Issue #2072

Open

84 tasks

Conversation

sufen-f commented Feb 27, 2025 • edited by alisartazkhan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding a model checklist

Command

Uh oh!

sufen-f commented Feb 27, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alisartazkhan commented Feb 28, 2025

Uh oh!

isaac-chung commented Feb 28, 2025

Uh oh!

alisartazkhan commented Feb 28, 2025

Uh oh!

isaac-chung commented Feb 28, 2025

Uh oh!

alisartazkhan commented Feb 28, 2025

Uh oh!

isaac-chung commented Feb 28, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alisartazkhan commented Feb 28, 2025

Uh oh!

isaac-chung commented Feb 28, 2025

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sufen-f commented Feb 27, 2025 •

edited by alisartazkhan

Loading