Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output .ctm of Speech Data Simulator has channel and spk_id swapped #7445

Closed
popcornell opened this issue Sep 15, 2023 · 7 comments · Fixed by #8004
Closed

Output .ctm of Speech Data Simulator has channel and spk_id swapped #7445

popcornell opened this issue Sep 15, 2023 · 7 comments · Fixed by #8004
Assignees
Labels

Comments

@popcornell
Copy link
Contributor

text = f"{session_name} {speaker_id} {align1} {align2} {word} 0\n"

But according to https://web.archive.org/web/20170119114252/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf it should be:

<SOURCE><SP><CHANNEL><SP> <BEG-TIME><SP><DURATION><SP><TOKEN><SP>
<CONF><SP><TYPE><SP><SPEAKER><NEWLINE>
@github-actions
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@tango4j
Copy link
Collaborator

tango4j commented Oct 17, 2023

Hi, @popcornell .
This is intended since we were creating dataset where all channels share the same word alignment
so we used channel slot as speaker.
Now that we are trying to let the public users use the data simulator freely, it needs to be updated to have consistency with the CTM convention in RT09 document.
I will keep this open until this gets fixed.

@popcornell
Copy link
Contributor Author

Hi Taejin, thanks for the reply.
I like a lot the data simulator, it is very fast and really helpful, I used it in a recent work.

I can actually fix this it is pretty easy, I have already done so locally.
I needed consistency with RT09 convention because I was using lhotse https://github.com/lhotse-speech/lhotse for dataloading and having .ctm was quite handy for loading in the manifests also the word alignments.

This is intended since we were creating dataset where all channels share the same word alignment
so we used channel slot as speaker.

There is a channel slot in the .ctm convention but IDK if it is what you need.

@tango4j
Copy link
Collaborator

tango4j commented Oct 17, 2023

Oh I see.
This definitely needs to be updated ASAP.
Also Piotr Zelasko [email protected] joined NVIDIA NeMo team, so
I think I could let him go through the PR to make sure the compatibility with lhotse.

@github-actions github-actions bot removed the stale label Oct 18, 2023
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Nov 17, 2023
Copy link
Contributor

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023
@popcornell
Copy link
Contributor Author

I have a PR that addresses this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment