Merged
Conversation
- define audio encoder interface - implement abstask and evaluator for clustering
Created test_maeb_datasets.py to test AbsTask and Evaluator for clustering
…SCAN and agglomerative algorithms into clustering evaluator, added algorithm selector into VoiceGender
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
…2175) * Added wav2vec model wrapper * Added four w2v variants * Update wav2vec_models.py * Removed run.py test script * Added subTask with small sample of dataset for testing * Removed test portion of VoiceGender.py task * add commit hash and bibtex * make lint * update models * fix circular import * make VoiceGender discoverable in get_tasks * add a2a as category for clustering * specify latest commit hash * revert linting changes * Based on feedback for model: updated w2v2 revisions and added torchaudio to .toml file * Added Bibtex for dataset, set data to be test instead of training, shortened task_subtype * Changed task from Voice Gender Clustering to Gender Clustering. * Fixed mock audio clustering tests * Added dataset metadata * Linted * Passed revision into the w2v2 loader * passed lint check * Linted * Update VoiceGender.py --------- Co-authored-by: Ali Sartaz Khan <alisartazkhan@gmail.com> Co-authored-by: Ali Sartaz Khan <71156712+alisartazkhan@users.noreply.github.com> Co-authored-by: mn <mn@Ms-MacBook-Pro.local> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…b dataset (subset)" (#2202)
… FSD50k Dataset and Task (#2082) * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch 4f23fdf This reverts commit 4f23fdf, reversing changes made to 1302477. * fix model imports * fqd50k cleaning * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask --------- Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * init audio * some encoder related changes * some more abs task defs Co-authored-by: rahulschand <rahulsc@stanford.edu> * evaluators and classification * remove rahul changes to generate first PR * make lint * add dataset/tasks skeleton * readd changes lost in rebase * add fsd50k * add task categories for audio * slight updates to fsd50k * make lint * wav2vec2 model * add fsd50k metadata * rename folder * add metric * add torchaudio in req * reigster wav2vec2 models * fixes * add audio in valid task types * mock interface changes * my 0 shot * make lint * rm audio clustering * wav2vec2 model revision update * rm comment * rm test.py * add revisions to all wav2vec2 models * rm empty abstask files * rm empty evaluator files * rm empty task files * Update tests/test_tasks/test_all_abstasks.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update mteb/models/wav2vec2_models.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * rm non-logReg evaluators for audio classification * lint * fn name changed to convert_audio_from_numpy * rm mock tests for audio kNN classification * rm evaluators for audio kNN classification * fix imports * fix audio kNN; make lint * rm AbsTaskAudioClassification.py for later PR * added zero-shot loading model and dataset checked * remove commented code; reset changes to ClassificationEvaluator.py * fix mock tasks for multilabel classification * make lint * inherit Wrapper class * add all languages supported by wav2vec2 * make lint * add script info to all languages * make lint * before cleaning comments * ESC and clap model. Tested 81 percent zero-shot numbers * fixed label names for ESC50-multilabel and removed comments * recent changes * merge wav2vec2 + add updated logic for auto padding for fqd50k type datasets * make lint remove uwanted files * remove debug lines * remove esc50 refs * changes for debugging * lint changes and maeb main branch merge * fix mock tasks for multilabel * fix mock tasks for multilabel * Revert "Merge branch 'maeb' into maeb" bad direct commit made to upstream maeb branch 4f23fdf This reverts commit 4f23fdf, reversing changes made to 1302477. * fix model imports * fqd50k cleaning * fixed error in Image zero shot classfification * update fsd50k * change dataset * eval subsets correctly * make lint and remove debug statements * clean print statements * make lint * update fsd2019 dataset * remove init in AbsTaskAudioMultilabelClassification.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * add class parameters in AbsTaskAudioMultilabelClassification * inherit from multilingualtask for FSD2019Kaggle * make lint * update mock_tasks; make lint * remove train_split from fn parameters * define fsd2019k to be multilingual * inherit from MultilingualTask in fsd2019K * fix tests * inherit correct multingial task class * remove MockAudioMultilabelClassificationLogRegTask * rm other instances of MockAudioMultilabelClassificationLogRegTask * removed unncessary files * removed unncrssary files * removed uncrssary files part 3 * deleted esc50 from multi label classification * fixed errors * fixed lintng, added precision and recall. Removed extra comments * fixed double loading of model * filled in missing meta-data * fixed linting --------- Co-authored-by: Animesh Jha <jha.animesh01@gmail.com> Co-authored-by: rahulschand <rahulsc@stanford.edu> Co-authored-by: Silky Singh <silky1708@gmail.com> Co-authored-by: Silky Singh <54901747+silky1708@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* added unfused model * fixed lint
…on task (#2285) * Added fsd50k dataset on huggingface * added correct hf version of fsd50k dataset * added correct hf version of fsd50k dataset * removed extra imports * removed unecessary load_data fn
* added large, music and speech clap models * fixed public_training_data and removed training_datasets split * added latest revision * lowercase mit license * fixed issue related to training_datasets * fixed lint
# Conflicts: # mteb/abstasks/task_metadata.py # mteb/leaderboard/app.py # mteb/tasks/reranking/eng/__init__.py # pyproject.toml # uv.lock
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* use collator for processing * use collator for processing * fix typing * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix qwen2audio * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * change warining to log * remove `_get_resampler` * fix clap with transformers v5 * add cast instead of ignore --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix globe v2 and add globe v3 * add descriptive stats * fix eval splits for globe v2 age and v2 gender
[MAEB] Add audio task installation instructions to docs Document FFmpeg and transformers>=4.57.6 requirements for users running audio tasks with datasets>=4. The datasets library v4+ uses torchcodec for audio processing which requires FFmpeg to be installed. Fixes #4023 Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
# Conflicts: # mteb/leaderboard/app.py # pyproject.toml # uv.lock
Collaborator
|
Need to regenerate uv.lock |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolve conflict in app.py by keeping both: - Session tracking functions from main (_generate_fingerprint_session_id, on_page_load) - skip_cache_file parameter from maeb Also add 'Beng' (Bengali script code) to typos ignore list. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function now only takes benchmark_name and computes languages/task_types/domains internally from the benchmark. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Member
Author
|
@isaac-chung I'm fixing tests in #4098 |
Collaborator
|
Great, thanks! I'll pause here then. |
* fix pyproject * fix torchcodec version
isaac-chung
reviewed
Feb 16, 2026
isaac-chung
reviewed
Feb 16, 2026
isaac-chung
reviewed
Feb 16, 2026
Collaborator
|
Only windows test failing now. @Samoed would you mind taking a look please? |
* regenerate lock * lock torchcodec to 0.9 * lock torchcodec to 0.9.1
Member
Author
|
In lock was torchcodec |
isaac-chung
approved these changes
Feb 16, 2026
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2072
What: Introduces MAEB (Massive Audio Embedding Benchmark) - a comprehensive benchmark for evaluating audio embedding models
Key additions:
Technical improvements: