Create Audio Event Detection Task by anime-sh · Pull Request #2338 · embeddings-benchmark/mteb

anime-sh · 2025-03-12T08:13:28Z

#2246
#2332
Waiting for tests to finish

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Samoed · 2025-03-12T08:18:52Z

mteb/abstasks/Audio/AbsTaskAudioEventDetection.py

+    def __init__(
+        self,
+        num_samples: int,
+        total_duration: float,
+        min_duration: float,
+        avg_duration: float,
+        max_duration: float,
+        sample_rate: int,
+        min_events_per_sample: int,
+        avg_events_per_sample: float,
+        max_events_per_sample: int,
+        unique_event_labels: int,
+        event_label_distribution: dict[str, int],
+        min_event_duration: float,
+        avg_event_duration: float,
+        max_event_duration: float,
+    ):
+        self.num_samples = num_samples
+        self.total_duration = total_duration
+        self.min_duration = min_duration
+        self.avg_duration = avg_duration
+        self.max_duration = max_duration
+        self.sample_rate = sample_rate
+        self.min_events_per_sample = min_events_per_sample
+        self.avg_events_per_sample = avg_events_per_sample
+        self.max_events_per_sample = max_events_per_sample
+        self.unique_event_labels = unique_event_labels
+        self.event_label_distribution = event_label_distribution
+        self.min_event_duration = min_event_duration
+        self.avg_event_duration = avg_event_duration
+        self.max_event_duration = max_event_duration


This is typed dict, you don't need __init__

Samoed · 2025-03-12T08:23:12Z

mteb/abstasks/Audio/AbsTaskAudioEventDetection.py

+    def evaluate(
+        self,
+        model: AudioEncoder,
+        eval_split: str = "test",
+        *,
+        encode_kwargs: dict[str, Any] = {},
+        **kwargs: Any,
+    ) -> dict[HFSubset, ScoresDict]:
+        if not self.data_loaded:
+            self.load_data()
+        scores = {}
+        hf_subsets = self.hf_subsets
+
+        for hf_subset in hf_subsets:
+            logger.info(
+                f"\nTask: {self.metadata.name}, split: {eval_split}, subset: {hf_subset}. Running..."
+            )
+
+            if hf_subset not in self.dataset and hf_subset == "default":
+                ds = self.dataset
+            else:
+                ds = self.dataset[hf_subset]
+            scores[hf_subset] = self._evaluate_subset(
+                model,
+                ds,
+                eval_split,
+                encode_kwargs=encode_kwargs,
+                **kwargs,
+            )
+            self._add_main_score(scores[hf_subset])
+
+        return scores


It seems that evaluate is the same as in AbsTask

Samoed · 2025-03-12T08:26:51Z

mteb/evaluation/evaluators/Audio/AudioEventDetectionEvaluator.py

+    def fit(self, X_train: list[np.ndarray], y_train: list[list[dict]]):
+        """Train frame-level classifier on audio embeddings"""
+        all_embeddings, all_labels = self._process_training_data(X_train, y_train)
+        self._init_model(input_dim=all_embeddings.shape[1])
+        X_tensor = torch.tensor(all_embeddings, dtype=torch.float32).to(self.device)
+        y_tensor = torch.tensor(all_labels, dtype=torch.float32).to(self.device)
+        optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3)
+        criterion = nn.BCELoss()
+
+        # Training loop
+        self.model.train()
+        for epoch in range(10):
+            optimizer.zero_grad()
+            outputs = self.model(X_tensor)
+            loss = criterion(outputs, y_tensor)
+            loss.backward()
+            optimizer.step()
+
+    def _init_model(self, input_dim):
+        self.model = nn.Sequential(
+            nn.Linear(input_dim, 256),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(256, len(self.classes_)),
+            nn.Sigmoid(),
+        ).to(self.device)


Can we use logreg or something simple instead of NN?

so the HEAR benchmark uses an NN to get per frame predictions and I thought we wanted to have evaluation similar to them, I would prefer having a simpler evaluator too

cc: @Muennighoff @silky1708 @RahulSChand

KennethEnevoldsen · 2025-06-15T18:57:25Z

@anime-sh will close this as it seems to have gotten stale

anime-sh added 2 commits March 12, 2025 01:12

Create Audio Event Detection Task

cca97c9

lint

49ff0d1

Samoed reviewed Mar 12, 2025

View reviewed changes

RahulSChand added the audio Audio extension label Mar 12, 2025

RahulSChand assigned anime-sh Mar 12, 2025

This was referenced May 7, 2025

MAEB Overview Issue #2072

Closed

Create Audio Event Detection Task #2670

Closed

KennethEnevoldsen added the stale label Jun 15, 2025

KennethEnevoldsen closed this Jun 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Audio Event Detection Task#2338

Create Audio Event Detection Task#2338
anime-sh wants to merge 2 commits intoembeddings-benchmark:maebfrom
anime-sh:maeb

anime-sh commented Mar 12, 2025 •

edited

Loading

Uh oh!

Samoed Mar 12, 2025

Uh oh!

Samoed Mar 12, 2025

Uh oh!

Samoed Mar 12, 2025

Uh oh!

anime-sh Mar 12, 2025

Uh oh!

KennethEnevoldsen commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

anime-sh commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Uh oh!

Samoed Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

anime-sh Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anime-sh commented Mar 12, 2025 •

edited

Loading