Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transforms attribute for MixedCut #1035

Merged
merged 5 commits into from
Apr 24, 2023

Conversation

desh2608
Copy link
Collaborator

Currently, for MixedCut, any transforms are always lazily applied on the underlying tracks. This is fine for transforms such as speed perturbation, but may not be ideal for other kinds of transforms. Consider the following examples:

  1. We have a MixedCut consisting of 3 tracks: 2 MonoCuts and a PaddingCut, i.e., it represents 2 utterances from a speaker with some silence in between. Suppose we want to convolve it with an RIR. In the current method, the tracks will be individually convolved, and the silence portion will not contain any residue from the previous MonoCut. Instead, we should first mix the 3 tracks and then apply the RIR on the resulting audio.

  2. We have a MixedCut comprising 2 MonoCuts with some overlap, and we want to perform loudness normalization. Currently, the tracks will be normalized individually and then mixed, and the resulting mixture may not have the target loudness.

To solve these problems, we introduce a transforms attribute for MixedCut. It is analogous to the transforms attribute in Recording, and contains the transforms that should be lazily applied at the time of audio loading, but after the underlying tracks have been mixed.

# to all tracks. It does not make sense if all tracks belong to different speakers,
# but it is useful for cases when we have a mixture of MonoCut and PaddingCut,
# and we want to apply the same RIR to all of them.
# 2. Apply RIRs to each track separately. This is useful when we want to simulate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your description here got me thinking -- does it really matter in which order we apply RIR and downmix? Casting aside the numerical differences, RIR reverb is a convolution, which is linear, so the sum of reverbed signals should be equal to the reverbed sum of signals, shouldn't it? Am I missing something here?

Copy link
Collaborator

@pzelasko pzelasko Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I missed the effect on track edges -- the signals are possibly not the same length and have different offsets, and before the PR we only pad them after (not before) the convolution. You mentioned this in the PR description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was mainly concerned about edge effects in the case of reverb.

@pzelasko
Copy link
Collaborator

Cool contribution! One question: what happens if we mix another track after the reverb/loudnorm is applied? It looks to me like the behaviour wouldn't be correct (it would reverb/norm everything including the track mixed after).

@desh2608
Copy link
Collaborator Author

Cool contribution! One question: what happens if we mix another track after the reverb/loudnorm is applied? It looks to me like the behaviour wouldn't be correct (it would reverb/norm everything including the track mixed after).

Yeah, I think perhaps the right approach is to not allow mixing in new tracks once any of the mix_first transforms have been applied.

@pzelasko
Copy link
Collaborator

pzelasko commented Apr 20, 2023

Yeah, I think perhaps the right approach is to not allow mixing in new tracks once any of the mix_first transforms have been applied.

Would it work if we added a MixedCut as a MixTrack when it has a transform defined on it? I think that technically mixing code only calls load_audio so it might work as expected..

EDIT there might be code all around the place though that doesn't expect MixedCut to be present in a track. Could you check if it's possible to make that work? It would be a pity to raise an exception here.

@desh2608
Copy link
Collaborator Author

Actually yeah we do allow MixedCut to be present as a MixTrack. In fact, the meeting simulation workflow explicitly uses this flexibility.

So you're right, if a new track is mixed into a MixedCut containing a transform, we can just create a nested MixedCut instead of adding to the tracks.

@pzelasko pzelasko added this to the v1.14 milestone Apr 20, 2023
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@desh2608
Copy link
Collaborator Author

LGTM

Cool, thanks. Merging.

@desh2608 desh2608 merged commit 1b31bf2 into lhotse-speech:master Apr 24, 2023
@desh2608 desh2608 deleted the mixed_cut_transform branch April 24, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants