-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transforms
attribute for MixedCut
#1035
Conversation
# to all tracks. It does not make sense if all tracks belong to different speakers, | ||
# but it is useful for cases when we have a mixture of MonoCut and PaddingCut, | ||
# and we want to apply the same RIR to all of them. | ||
# 2. Apply RIRs to each track separately. This is useful when we want to simulate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your description here got me thinking -- does it really matter in which order we apply RIR and downmix? Casting aside the numerical differences, RIR reverb is a convolution, which is linear, so the sum of reverbed signals should be equal to the reverbed sum of signals, shouldn't it? Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I missed the effect on track edges -- the signals are possibly not the same length and have different offsets, and before the PR we only pad them after (not before) the convolution. You mentioned this in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was mainly concerned about edge effects in the case of reverb.
Cool contribution! One question: what happens if we mix another track after the reverb/loudnorm is applied? It looks to me like the behaviour wouldn't be correct (it would reverb/norm everything including the track mixed after). |
Yeah, I think perhaps the right approach is to not allow mixing in new tracks once any of the |
Would it work if we added a MixedCut as a MixTrack when it has a transform defined on it? I think that technically mixing code only calls load_audio so it might work as expected.. EDIT there might be code all around the place though that doesn't expect MixedCut to be present in a track. Could you check if it's possible to make that work? It would be a pity to raise an exception here. |
Actually yeah we do allow MixedCut to be present as a MixTrack. In fact, the meeting simulation workflow explicitly uses this flexibility. So you're right, if a new track is mixed into a MixedCut containing a transform, we can just create a nested MixedCut instead of adding to the tracks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Cool, thanks. Merging. |
Currently, for
MixedCut
, any transforms are always lazily applied on the underlying tracks. This is fine for transforms such as speed perturbation, but may not be ideal for other kinds of transforms. Consider the following examples:We have a
MixedCut
consisting of 3 tracks: 2MonoCut
s and aPaddingCut
, i.e., it represents 2 utterances from a speaker with some silence in between. Suppose we want to convolve it with an RIR. In the current method, the tracks will be individually convolved, and the silence portion will not contain any residue from the previousMonoCut
. Instead, we should first mix the 3 tracks and then apply the RIR on the resulting audio.We have a
MixedCut
comprising 2MonoCut
s with some overlap, and we want to perform loudness normalization. Currently, the tracks will be normalized individually and then mixed, and the resulting mixture may not have the target loudness.To solve these problems, we introduce a
transforms
attribute forMixedCut
. It is analogous to thetransforms
attribute inRecording
, and contains the transforms that should be lazily applied at the time of audio loading, but after the underlying tracks have been mixed.