-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loudness normalization with pyloudnorm
#1016
Conversation
# clipping the audio. | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter("ignore") | ||
loudness_normalized_audio = pyln.normalize.loudness(audio.T, loudness, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If audio clipping is an issue here you can add a limiter as a post-processing step, e.g. https://github.com/pzelasko/cylimiter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add it. But I don't have enough familiarity with limiters (or "loudness" for that matter) to know what to do exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults should "just work" with pretty much anything, it basically keeps track of the signal's loudness with a small lookahead and reduces the gain if it crosses some threshold. Think of it as soft clipping that doesn't introduce as much distortion as hard clipping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. but again I don't know if that's a real problem with this approach and worth the extra dependency, so it's your call :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I have only used it to make the LibriCSS distant mic audio louder (it is at around -52 dB originally), and it sounds okay even with the clipping. I suppose we can let it be for now and add the limiter later if someone needs it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I left a single comment, up to you :)
""" | ||
|
||
target: float | ||
sampling_rate: int = 16000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pzelasko should we make sampling_rate
as a member?
class AudioTransform:
def __call__(self, samples: np.ndarray, sampling_rate: int) -> np.ndarray:
"""
Apply transform.
To be implemented in derived classes.
"""
raise NotImplementedError
:return: a modified copy of the current ``Recording``. | ||
""" | ||
transforms = self.transforms.copy() if self.transforms is not None else [] | ||
transforms.append(LoudnessNormalization(target=target).to_dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampling_rate
is missed
I used set.normalize_loudness as blow: https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173
But model's accuracy drops a lot. |
@lifeiteng what metric is this curve? If it is validation accuracy, did you normalize loudness for both train and validation sets? I would think that for the training set, it may be better to "perturb" the loudness (within some range) rather than normalize it, similar to how we perturb volume. |
@desh2608 Training CrossEntropy Top10Accuracy. |
@lifeiteng If it's for TTS, can you listen to the train and dev examples before and after normalization, as well to the model predictions? Maybe that could reveal if there's something funky going on. |
Related to #966.
This PR adds a method
normalize_loudness()
for recordings. This takes an argumenttarget
which specifies the desired loudness (usually around -23 dB is a good value). The implementation uses pyloudnorm.Also, we move
ReverbWithImpulseResponse
out of the torchaudio.py and into its own file, since it is not a torchaudio based transform.