Skip to content

Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset#2175

Merged
isaac-chung merged 25 commits intoembeddings-benchmark:maebfrom
sufen-f:models
Feb 28, 2025
Merged

Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset#2175
isaac-chung merged 25 commits intoembeddings-benchmark:maebfrom
sufen-f:models

Conversation

@sufen-f
Copy link
Contributor

@sufen-f sufen-f commented Feb 27, 2025

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding a model checklist

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.

Command

import mteb

model_name = "facebook/wav2vec2-base"
model = mteb.get_model(model_name, model_revision="0b5b8e868dd84f03fd87d01f9c4ff0f080fecfe8")
tasks = mteb.get_tasks(tasks=["VoiceGenderClustering"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")

@sufen-f sufen-f requested a review from isaac-chung February 27, 2025 07:17
@sufen-f
Copy link
Contributor Author

sufen-f commented Feb 27, 2025

@alisartazkhan @mnasser3

@sufen-f sufen-f changed the title Models - added Wav2Vec model Added Wav2Vec model, voice clustering task, VoxCeleb dataset subset Feb 27, 2025
Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for separating this out! I think we're close. A few comments here. Also, please make sure the tests pass.

Based on this comment, it seemed like the newly added task and model can be run. Please also share the command / script used in the PR description, like so:

import mteb
#example code here

@alisartazkhan
Copy link

Does anyone know why we are failing the lint test? When I run make lint-check, it says All checks passed!. But, it seems to fail it here.

@isaac-chung
Copy link
Collaborator

@alisartazkhan try make lint, which should update the files, then commit those changes.

@alisartazkhan
Copy link

@isaac-chung I tried both make lint`` and make lint-check``` and seems like we pass all checks for both instances.

@isaac-chung
Copy link
Collaborator

@alisartazkhan what ruff version are you using? This branch seems to be using ruff==0.6.4

@alisartazkhan
Copy link

I see. I'm using ruff== 0.9.8. Sufen's latest commit seem to have done the trick.

@isaac-chung
Copy link
Collaborator

@alisartazkhan the maeb branch is a bit behind the main branch, which uses ruff=0.9.7. This should be fixed when we update maeb.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work team! Just one final small thing. The alternative is just to specify the "train" split.

@alisartazkhan
Copy link

I just made the final adjustment. Let me know if there's anything else. Thanks for the continuous support @isaac-chung and @Samoed !

@isaac-chung
Copy link
Collaborator

Looks good, thanks for iterating! I'll enable auto-merge now.

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@isaac-chung isaac-chung merged commit 1302477 into embeddings-benchmark:maeb Feb 28, 2025
9 checks passed
@kkaitlyn111 kkaitlyn111 mentioned this pull request May 9, 2025
84 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants