Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to normalize mel spectorgram extracted by Audio2Wav class? #45

Open
predawnang opened this issue May 10, 2023 · 0 comments
Open

How to normalize mel spectorgram extracted by Audio2Wav class? #45

predawnang opened this issue May 10, 2023 · 0 comments

Comments

@predawnang
Copy link

Hi,

I want to normalize mel spectrograms extracted by Audio2Wav class to the range [-1, 1], but I have no idea how to do it (I found some code that seems like normalization and then adapted to the Audio2Wav). I hope somebody could give me some advices.

I modified the code of Audio2Wav class base on other people's code hoping that could achieve the normalization.

    def forward(self, audio):
        p = (self.n_fft - self.hop_length) // 2
        audio = F.pad(audio, (p, p), "reflect").squeeze(1)
        fft = torch.stft(
            audio,
            n_fft=self.n_fft,
            hop_length=self.hop_length,
            win_length=self.win_length,
            window=self.window,
            center=False,
        )
        real_part, imag_part = fft.unbind(-1)
        magnitude = torch.sqrt(real_part ** 2 + imag_part ** 2)
        mel_output = torch.matmul(self.mel_basis, magnitude)
        log_mel_spec = torch.log10(torch.clamp(mel_output, min=1e-5))
       
        # The code I added. ref_db => 20, dc_db => 100
        db_mel = 20 * log_mel_spec
        return (db_mel - ref_db + dc_db) / dc_db

Im not very sure what the purpose of the two line I added, could somebody help me figure out?
for 20 * log_mel_spec, I guess is to convert amplitude to db scale, but Im not sure is it right to time 20 to the log_mel_spec.
and (db_mel - ref_db + dc_db) / dc_db normalize the mels, and I don't know the technical name of this operation.

Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant