Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match cut_id to utt_id if there is exactly one supervision per cut #853

Merged
merged 2 commits into from
Oct 17, 2022

Conversation

wgb14
Copy link
Contributor

@wgb14 wgb14 commented Oct 15, 2022

No description provided.

@desh2608
Copy link
Collaborator

I am a little skeptical about enforcing this in the function by default.

@wgb14
Copy link
Contributor Author

wgb14 commented Oct 15, 2022

I am a little skeptical about enforcing this in the function by default.

See discussion here: k2-fsa/icefall#620. what do you think would be a better way?

@desh2608
Copy link
Collaborator

After reading the linked discussion, I think it makes sense in view of ease of use. But please add a sentence in the function's docstring explaining this behavior.

@@ -431,7 +431,9 @@ def trim_to_supervisions(

:param keep_overlapping: when ``False``, it will discard parts of other supervisions that overlap with the
main supervision. In the illustration above, it would discard ``Sup2`` in ``Cut1`` and ``Sup1`` in ``Cut2``.
In this mode, we guarantee that there will always be exactly one supervision per cut.
In this mode, we guarantee that there will always be exactly one supervision per cut, and we set the ``id``
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perfect.

@desh2608
Copy link
Collaborator

Let's get this checked in since it is a small change.

@desh2608 desh2608 merged commit bfe0bdf into lhotse-speech:master Oct 17, 2022
@@ -470,6 +472,7 @@ def trim_to_supervisions(
if not keep_overlapping:
# Ensure that there is exactly one supervision per cut.
trimmed = trimmed.filter_supervisions(lambda s: s.id == segment.id)
trimmed.id = segment.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit un-intuitive that the IDs will have one format or another depending on the setting keep_overlapping. Maybe we could always set the cut ID to supervision ID regardless of this flag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. There could be a single supervision even when keep_overlapping is set to True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I didn't do overlapping ASR corpus before, and didn't want to break any pipeline.
If it's still good to set cut_id to utt_id while there is overlapping, we can revert this PR and modify it, or just open a new one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open a new PR

Copy link
Collaborator

@desh2608 desh2608 Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wgb14 What I meant (and changed in #855) was that even if keep_overlapping is set to True, it may happen that a trimmed cut has only 1 supervision, if there was no overlapping supervision originally. In such case, it makes sense to set the trimmed cut id to that of the supervision.

Now, even if the cut has multiple overlapping supervisions, one could argue that its boundaries are defined by 1 "main" supervision, and the others just happen to be included. So it would still make sense to change the cut id to that of the main supervision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants