Loudness normalization with `pyloudnorm` #1016

desh2608 · 2023-04-05T17:24:23Z

Related to #966.

This PR adds a method normalize_loudness() for recordings. This takes an argument target which specifies the desired loudness (usually around -23 dB is a good value). The implementation uses pyloudnorm.

Also, we move ReverbWithImpulseResponse out of the torchaudio.py and into its own file, since it is not a torchaudio based transform.

pzelasko · 2023-04-05T18:20:34Z

lhotse/augmentation/loudness.py

+    # clipping the audio.
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loudness_normalized_audio = pyln.normalize.loudness(audio.T, loudness, target)


If audio clipping is an issue here you can add a limiter as a post-processing step, e.g. https://github.com/pzelasko/cylimiter

Sure, I can add it. But I don't have enough familiarity with limiters (or "loudness" for that matter) to know what to do exactly.

The defaults should "just work" with pretty much anything, it basically keeps track of the signal's loudness with a small lookahead and reduces the gain if it crosses some threshold. Think of it as soft clipping that doesn't introduce as much distortion as hard clipping.

.. but again I don't know if that's a real problem with this approach and worth the extra dependency, so it's your call :)

So far I have only used it to make the LibriCSS distant mic audio louder (it is at around -52 dB originally), and it sounds okay even with the clipping. I suppose we can let it be for now and add the limiter later if someone needs it?

pzelasko

Looks good to me, I left a single comment, up to you :)

lifeiteng · 2023-04-14T15:33:48Z

lhotse/augmentation/loudness.py

+    """
+
+    target: float
+    sampling_rate: int = 16000


@pzelasko should we make sampling_rate as a member?

class AudioTransform: def __call__(self, samples: np.ndarray, sampling_rate: int) -> np.ndarray: """ Apply transform. To be implemented in derived classes. """ raise NotImplementedError

lifeiteng · 2023-04-14T15:39:40Z

lhotse/audio.py

+        :return: a modified copy of the current ``Recording``.
+        """
+        transforms = self.transforms.copy() if self.transforms is not None else []
+        transforms.append(LoudnessNormalization(target=target).to_dict())


sampling_rate is missed

lifeiteng · 2023-04-14T15:46:27Z

@desh2608 @pzelasko

I used set.normalize_loudness as blow: https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173

if args.prefix == "aishell":
    # NOTE: the loudness of aishell audio files is around -33
    # The best way is datamodule --on-the-fly-feats --enable-audio-aug
    cut_set = cut_set.normalize_loudness(
        target=-20.0, affix_id=True
    )

cut_set = cut_set.resample(24000)

But model's accuracy drops a lot.

desh2608 · 2023-04-14T15:56:01Z

@lifeiteng what metric is this curve? If it is validation accuracy, did you normalize loudness for both train and validation sets? I would think that for the training set, it may be better to "perturb" the loudness (within some range) rather than normalize it, similar to how we perturb volume.

lifeiteng · 2023-04-14T17:12:46Z

@desh2608 Training CrossEntropy Top10Accuracy.
yes, normalized loudness for both train and validation sets.
It's used in text-to-speech, yes "perturb" is better, but I want to make it simple at current stage.
I don't understand why normalization leads to a significant drop in accuracy. The bug(comments above) is not triggered.

pzelasko · 2023-04-14T17:40:44Z

@lifeiteng If it's for TTS, can you listen to the train and dev examples before and after normalization, as well to the model predictions? Maybe that could reveal if there's something funky going on.

desh2608 added 5 commits April 4, 2023 16:50

move reverb augmentation to separate file

0f463c3

fix old code

2f8b064

add loudness normalization

d4e6a65

add loudness normalization with pyloudnorm

cfdf0ab

minor fix

493d1cf

pzelasko reviewed Apr 5, 2023

View reviewed changes

pzelasko approved these changes Apr 5, 2023

View reviewed changes

pzelasko added this to the v1.14 milestone Apr 5, 2023

desh2608 merged commit 3a3ed61 into lhotse-speech:master Apr 6, 2023

desh2608 deleted the loud_norm branch April 6, 2023 01:54

lifeiteng reviewed Apr 14, 2023

View reviewed changes

lifeiteng mentioned this pull request Apr 15, 2023

AISHELL1 with cut_set.normalize_loudness lifeiteng/vall-e#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loudness normalization with `pyloudnorm` #1016

Loudness normalization with `pyloudnorm` #1016

desh2608 commented Apr 5, 2023

pzelasko Apr 5, 2023

desh2608 Apr 5, 2023

pzelasko Apr 5, 2023 •

edited

Loading

pzelasko Apr 5, 2023

desh2608 Apr 5, 2023

pzelasko left a comment

lifeiteng Apr 14, 2023

lifeiteng Apr 14, 2023

lifeiteng commented Apr 14, 2023

desh2608 commented Apr 14, 2023

lifeiteng commented Apr 14, 2023 •

edited

Loading

pzelasko commented Apr 14, 2023

Loudness normalization with pyloudnorm #1016

Loudness normalization with pyloudnorm #1016

Conversation

desh2608 commented Apr 5, 2023

pzelasko Apr 5, 2023

Choose a reason for hiding this comment

desh2608 Apr 5, 2023

Choose a reason for hiding this comment

pzelasko Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

pzelasko Apr 5, 2023

Choose a reason for hiding this comment

desh2608 Apr 5, 2023

Choose a reason for hiding this comment

pzelasko left a comment

Choose a reason for hiding this comment

lifeiteng Apr 14, 2023

Choose a reason for hiding this comment

lifeiteng Apr 14, 2023

Choose a reason for hiding this comment

lifeiteng commented Apr 14, 2023

desh2608 commented Apr 14, 2023

lifeiteng commented Apr 14, 2023 • edited Loading

pzelasko commented Apr 14, 2023

Loudness normalization with `pyloudnorm` #1016

Loudness normalization with `pyloudnorm` #1016

pzelasko Apr 5, 2023 •

edited

Loading

lifeiteng commented Apr 14, 2023 •

edited

Loading