Create Audio Event Detection Task#2338
Closed
anime-sh wants to merge 2 commits intoembeddings-benchmark:maebfrom
Closed
Create Audio Event Detection Task#2338anime-sh wants to merge 2 commits intoembeddings-benchmark:maebfrom
anime-sh wants to merge 2 commits intoembeddings-benchmark:maebfrom
Conversation
Samoed
reviewed
Mar 12, 2025
Comment on lines
+23
to
+53
| def __init__( | ||
| self, | ||
| num_samples: int, | ||
| total_duration: float, | ||
| min_duration: float, | ||
| avg_duration: float, | ||
| max_duration: float, | ||
| sample_rate: int, | ||
| min_events_per_sample: int, | ||
| avg_events_per_sample: float, | ||
| max_events_per_sample: int, | ||
| unique_event_labels: int, | ||
| event_label_distribution: dict[str, int], | ||
| min_event_duration: float, | ||
| avg_event_duration: float, | ||
| max_event_duration: float, | ||
| ): | ||
| self.num_samples = num_samples | ||
| self.total_duration = total_duration | ||
| self.min_duration = min_duration | ||
| self.avg_duration = avg_duration | ||
| self.max_duration = max_duration | ||
| self.sample_rate = sample_rate | ||
| self.min_events_per_sample = min_events_per_sample | ||
| self.avg_events_per_sample = avg_events_per_sample | ||
| self.max_events_per_sample = max_events_per_sample | ||
| self.unique_event_labels = unique_event_labels | ||
| self.event_label_distribution = event_label_distribution | ||
| self.min_event_duration = min_event_duration | ||
| self.avg_event_duration = avg_event_duration | ||
| self.max_event_duration = max_event_duration |
Member
There was a problem hiding this comment.
This is typed dict, you don't need __init__
Comment on lines
+125
to
+156
| def evaluate( | ||
| self, | ||
| model: AudioEncoder, | ||
| eval_split: str = "test", | ||
| *, | ||
| encode_kwargs: dict[str, Any] = {}, | ||
| **kwargs: Any, | ||
| ) -> dict[HFSubset, ScoresDict]: | ||
| if not self.data_loaded: | ||
| self.load_data() | ||
| scores = {} | ||
| hf_subsets = self.hf_subsets | ||
|
|
||
| for hf_subset in hf_subsets: | ||
| logger.info( | ||
| f"\nTask: {self.metadata.name}, split: {eval_split}, subset: {hf_subset}. Running..." | ||
| ) | ||
|
|
||
| if hf_subset not in self.dataset and hf_subset == "default": | ||
| ds = self.dataset | ||
| else: | ||
| ds = self.dataset[hf_subset] | ||
| scores[hf_subset] = self._evaluate_subset( | ||
| model, | ||
| ds, | ||
| eval_split, | ||
| encode_kwargs=encode_kwargs, | ||
| **kwargs, | ||
| ) | ||
| self._add_main_score(scores[hf_subset]) | ||
|
|
||
| return scores |
Member
There was a problem hiding this comment.
It seems that evaluate is the same as in AbsTask
Comment on lines
+148
to
+173
| def fit(self, X_train: list[np.ndarray], y_train: list[list[dict]]): | ||
| """Train frame-level classifier on audio embeddings""" | ||
| all_embeddings, all_labels = self._process_training_data(X_train, y_train) | ||
| self._init_model(input_dim=all_embeddings.shape[1]) | ||
| X_tensor = torch.tensor(all_embeddings, dtype=torch.float32).to(self.device) | ||
| y_tensor = torch.tensor(all_labels, dtype=torch.float32).to(self.device) | ||
| optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3) | ||
| criterion = nn.BCELoss() | ||
|
|
||
| # Training loop | ||
| self.model.train() | ||
| for epoch in range(10): | ||
| optimizer.zero_grad() | ||
| outputs = self.model(X_tensor) | ||
| loss = criterion(outputs, y_tensor) | ||
| loss.backward() | ||
| optimizer.step() | ||
|
|
||
| def _init_model(self, input_dim): | ||
| self.model = nn.Sequential( | ||
| nn.Linear(input_dim, 256), | ||
| nn.ReLU(), | ||
| nn.Dropout(0.2), | ||
| nn.Linear(256, len(self.classes_)), | ||
| nn.Sigmoid(), | ||
| ).to(self.device) |
Member
There was a problem hiding this comment.
Can we use logreg or something simple instead of NN?
Contributor
Author
There was a problem hiding this comment.
so the HEAR benchmark uses an NN to get per frame predictions and I thought we wanted to have evaluation similar to them, I would prefer having a simpler evaluator too
This was referenced May 7, 2025
Closed
Contributor
|
@anime-sh will close this as it seems to have gotten stale |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2246
#2332
Waiting for tests to finish
Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)