Skip to content

fix FSD-50K Task Metadata, Label handling and add stratified subsampling#2369

Merged
isaac-chung merged 1 commit intoembeddings-benchmark:maebfrom
anime-sh:fsd50k-fix
Mar 18, 2025
Merged

fix FSD-50K Task Metadata, Label handling and add stratified subsampling#2369
isaac-chung merged 1 commit intoembeddings-benchmark:maebfrom
anime-sh:fsd50k-fix

Conversation

@anime-sh
Copy link

Fixes concerns of #2285 (comment) and adds stratified subsampling for test set

Code Quality

  • Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

  • Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

  • New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
  • Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

@anime-sh
Copy link
Author

image
printed label counter in the undersampling function

@anime-sh anime-sh self-assigned this Mar 15, 2025
@anime-sh anime-sh added the maeb Audio extension label Mar 15, 2025
self.label_column_name: x[self.label_column_name].split(","),
}
)
self.dataset = self.stratified_subsampling(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should add stratified subsampling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 5k test samples which is > 2048 as mentioned in the pr descp so i added it, I can remove subsampling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following the PR template. It's fine to stay.

self.label_column_name: x[self.label_column_name].split(","),
}
)
self.dataset = self.stratified_subsampling(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following the PR template. It's fine to stay.

@isaac-chung isaac-chung merged commit 230064a into embeddings-benchmark:maeb Mar 18, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maeb Audio extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants