-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match cut_id
to utt_id
if there is exactly one supervision per cut
#853
Conversation
I am a little skeptical about enforcing this in the function by default. |
See discussion here: k2-fsa/icefall#620. what do you think would be a better way? |
After reading the linked discussion, I think it makes sense in view of ease of use. But please add a sentence in the function's docstring explaining this behavior. |
@@ -431,7 +431,9 @@ def trim_to_supervisions( | |||
|
|||
:param keep_overlapping: when ``False``, it will discard parts of other supervisions that overlap with the | |||
main supervision. In the illustration above, it would discard ``Sup2`` in ``Cut1`` and ``Sup1`` in ``Cut2``. | |||
In this mode, we guarantee that there will always be exactly one supervision per cut. | |||
In this mode, we guarantee that there will always be exactly one supervision per cut, and we set the ``id`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perfect.
Let's get this checked in since it is a small change. |
@@ -470,6 +472,7 @@ def trim_to_supervisions( | |||
if not keep_overlapping: | |||
# Ensure that there is exactly one supervision per cut. | |||
trimmed = trimmed.filter_supervisions(lambda s: s.id == segment.id) | |||
trimmed.id = segment.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit un-intuitive that the IDs will have one format or another depending on the setting keep_overlapping
. Maybe we could always set the cut ID to supervision ID regardless of this flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch. There could be a single supervision even when keep_overlapping is set to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I didn't do overlapping ASR corpus before, and didn't want to break any pipeline.
If it's still good to set cut_id
to utt_id
while there is overlapping, we can revert this PR and modify it, or just open a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a new PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgb14 What I meant (and changed in #855) was that even if keep_overlapping
is set to True, it may happen that a trimmed cut has only 1 supervision, if there was no overlapping supervision originally. In such case, it makes sense to set the trimmed cut id to that of the supervision.
Now, even if the cut has multiple overlapping supervisions, one could argue that its boundaries are defined by 1 "main" supervision, and the others just happen to be included. So it would still make sense to change the cut id to that of the main supervision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thanks
No description provided.