`ref_level_db = 20` and `min_level_db = -100` Where did these values come from?Statistics? #17

WithoutDoubt · 2020-03-18T15:50:02Z

No description provided.

begeekmyfriend · 2020-03-19T01:58:35Z

Well these hyper parameters have been abandoned in the latest version where a new STFT based on convolution has been applied to mel spectrograms preprocessing. Because these hyper parameters would lead to indetermination of melspec values while the new STFT the lowest value is fixed to 11.5129 as known as mel padding. You might print mel.min() and mel.max() to verify it.

lukewys · 2020-03-23T06:26:16Z

Dear @begeekmyfriend , I have been wondering about these parameters too (from earlier version). Basically you apply:
D = _stft(preemphasis(y))
S = _amp_to_db(np.abs(D)) - hparams.ref_level_db
np.clip((S - hparams.min_level_db) / -hparams.min_level_db, 0, 1) ##this is to normalize

So could you tell me what's the reason behind this? And why you choose such value?
Thanks in advance!

begeekmyfriend · 2020-03-23T06:54:46Z

As far as I know, this approach is deprived early from Kyubyong's tacotron. It would lead to nagative bias between GTA mel spectrograms and groud truth ones due to ref_level_db which is indispensable for log calculation. Therefore I am using convolutional STFT currently. You might see it in data_function.py.

lukewys · 2020-03-23T08:22:39Z

Thanks very much! I am also working on audio synthesis and processing. I am using the old preprocess method, and I am now considering changing to yours. Thanks again!

begeekmyfriend · 2020-03-23T08:36:05Z

You can print mel.min() to find out the lowest values seem fixed whatever they are ground truth or inference.

lukewys · 2020-03-30T12:27:35Z

Dear @begeekmyfriend, I took a bit deeper look into the code you write, can I understand your current mel_spectrogram generation in the following way?

mel = dynamic_range_compression(_linear_to_mel(np.abs(_stft(preemphasis(y)))))

So I run a bit through the code on my data and found that the range of Mel spectrogram now becomes:
Max: 1.6808715
Min: -10.579149 (I think the lowest value possible is np.log(1e-5) which is -11.5)

Am I running the code correctly? And also I am wondering why now use log instead of 20log10? Is there a disadvantage comparing to 20log10?

Thanks very much in advance!

begeekmyfriend · 2020-03-31T01:42:23Z

The lowest value must be -11.5129 as the STFT is the same with other TTS projects on PyTorch like melgan. I can provide a piece of python script to verify it.

import glob
import os
import numpy as np
import sys

mins = []
maxs = []
for f in glob.glob(os.path.join(sys.argv[1], '**', 'mels', '*.npy')):
	mel = np.load(f)
	mins.append(mel.min())
	maxs.append(mel.max())

print(sorted(mins)[0], sorted(maxs)[-1])

lukewys · 2020-03-31T07:46:23Z

Dear @begeekmyfriend , thanks vert much.

begeekmyfriend · 2020-08-02T08:56:12Z

Hi all, I think I have found out the cause behind the transition. Please look at this online PPT which illustrates the quantization from fp32 to 8-bit done by TensorRT library. We find that it is much like the transition of amplitude to decibel. There is significant accuracy loss without saturation in general because the samples near maximum edges are noisy which might well be amplified by scaling. Therefore we need to set a threshold near the maximum edges to truncate the edge values. In Tacotron we have set a min_level_db as well as a ref_level_db and then make the clip movement in normalization. The quantization illustrated by PPT can explain such cause.

lukewys · 2020-08-03T01:25:53Z

Hi, @begeekmyfriend, thanks very much for sharing this slide.

WithoutDoubt changed the title ~~ref_level_db = 20 and min_level_db Where did these values come from?Statistics?~~ ref_level_db = 20 and min_level_db = -100 Where did these values come from?Statistics? Mar 18, 2020

begeekmyfriend added the question Further information is requested label Mar 19, 2020

begeekmyfriend closed this as completed Mar 24, 2020

begeekmyfriend mentioned this issue Sep 16, 2020

WaveRNN generated samples sound strange #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ref_level_db = 20` and `min_level_db = -100` Where did these values come from?Statistics? #17

`ref_level_db = 20` and `min_level_db = -100` Where did these values come from?Statistics? #17

WithoutDoubt commented Mar 18, 2020

begeekmyfriend commented Mar 19, 2020

lukewys commented Mar 23, 2020

begeekmyfriend commented Mar 23, 2020

lukewys commented Mar 23, 2020

begeekmyfriend commented Mar 23, 2020

lukewys commented Mar 30, 2020

begeekmyfriend commented Mar 31, 2020

lukewys commented Mar 31, 2020

begeekmyfriend commented Aug 2, 2020 •

edited

Loading

lukewys commented Aug 3, 2020

ref_level_db = 20 and min_level_db = -100 Where did these values come from?Statistics? #17

ref_level_db = 20 and min_level_db = -100 Where did these values come from?Statistics? #17

Comments

WithoutDoubt commented Mar 18, 2020

begeekmyfriend commented Mar 19, 2020

lukewys commented Mar 23, 2020

begeekmyfriend commented Mar 23, 2020

lukewys commented Mar 23, 2020

begeekmyfriend commented Mar 23, 2020

lukewys commented Mar 30, 2020

begeekmyfriend commented Mar 31, 2020

lukewys commented Mar 31, 2020

begeekmyfriend commented Aug 2, 2020 • edited Loading

lukewys commented Aug 3, 2020

`ref_level_db = 20` and `min_level_db = -100` Where did these values come from?Statistics? #17

`ref_level_db = 20` and `min_level_db = -100` Where did these values come from?Statistics? #17

begeekmyfriend commented Aug 2, 2020 •

edited

Loading