Skip to content

Audio Retrieval Dataset: CMU-Arctic#2929

Merged
KennethEnevoldsen merged 2 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-dataset-cmuarctic
Jul 23, 2025
Merged

Audio Retrieval Dataset: CMU-Arctic#2929
KennethEnevoldsen merged 2 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-dataset-cmuarctic

Conversation

@AdnanElAssadi56
Copy link
Contributor

Results on laion/clap-htsat-fused

CMUArcticT2ARetrieval.json
CMUArcticA2TRetrieval.json

Comment on lines 11 to 12
"Retrieve the correct transcription for a given speech segment "
"from the phonetically balanced CMU Arctic single-speaker TTS corpora."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Retrieve the correct transcription for a given speech segment "
"from the phonetically balanced CMU Arctic single-speaker TTS corpora."
"Retrieve the correct transcription for an English speech segment. "
"The dataset is derived from the phonetically balanced CMU Arctic single-speaker TTS corpora. The corpora contains 1150 samples based on read-aloud segments from books, which are out of copyright and derived from the Gutenberg project."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other docstring - otherwise feel free to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@KennethEnevoldsen KennethEnevoldsen merged commit 53071b3 into embeddings-benchmark:maeb Jul 23, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants