-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.ctm in data simulator annotator compliant with RT-09 specification #8004
Conversation
Signed-off-by: popcornell <[email protected]>
NeMo contributors, please do not merge this PR until I make sure all the CTM in NeMo is following the official CTM format. |
Signed-off-by: popcornell <[email protected]>
Signed-off-by: popcornell <[email protected]>
Seems that also this is not compliant (speaker id instead of channel): This is instead kinda compliant (but lacks for missing fields):
|
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Taejin Park <[email protected]>
for more information, see https://pre-commit.ci
@erastorgueva-nv Elena, I have found a line that renders CTM and I replaced with the @stevehuang52 We are also making slight changes to data simulator. Please review and approve. |
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
if type(beg_time) != float: | ||
beg_time = round(float(beg_time), output_precision) | ||
if type(duration) != float: | ||
duration = round(float(duration), output_precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, beg_time
and duration
do not get rounded if they are floats already. Please remove the if-statements, I don't think they are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to always round the number. Also checking whether beg_time is either float or string containing floating point number.
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this, since this PR went through several rounds of reviews and feedbacks.
jenkins |
@tango4j / reviewers, merge when ready. Also reminder, NeMo devs need to explicitly write "jenkins" in order to execute the CI |
Oh, when did it change the protocol? |
jenkins |
jenkins |
Signed-off-by: Taejin Park <[email protected]>
jenkins |
jenkins |
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving changes
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <[email protected]> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <[email protected]> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <[email protected]> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <[email protected]> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <[email protected]> * Reflected comments from PR Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <[email protected]> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <[email protected]> * Resolved another merge conflict Signed-off-by: Taejin Park <[email protected]> * Fixed the test errors Signed-off-by: Taejin Park <[email protected]> * Fixed the missed commented lines Signed-off-by: Taejin Park <[email protected]> --------- Signed-off-by: popcornell <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <[email protected]>
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <[email protected]> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <[email protected]> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <[email protected]> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <[email protected]> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <[email protected]> * Reflected comments from PR Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <[email protected]> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <[email protected]> * Resolved another merge conflict Signed-off-by: Taejin Park <[email protected]> * Fixed the test errors Signed-off-by: Taejin Park <[email protected]> * Fixed the missed commented lines Signed-off-by: Taejin Park <[email protected]> --------- Signed-off-by: popcornell <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <[email protected]> Signed-off-by: Sasha Meister <[email protected]>
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <[email protected]> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <[email protected]> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <[email protected]> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <[email protected]> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <[email protected]> * Reflected comments from PR Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <[email protected]> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <[email protected]> * Resolved another merge conflict Signed-off-by: Taejin Park <[email protected]> * Fixed the test errors Signed-off-by: Taejin Park <[email protected]> * Fixed the missed commented lines Signed-off-by: Taejin Park <[email protected]> --------- Signed-off-by: popcornell <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <[email protected]>
Redid this PR from #7999
An attempt to fix #7445 so that the data simulator .ctm are compliant with RT-09 specification (see https://web.archive.org/web/20170119114252/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf):
I have put for fields unknown e.g. .
This makes it also easy to use the generated sessions with https://github.com/lhotse-speech/lhotse