Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low signal level distorts MFCC/GFCC values #543

Closed
dbogdanov opened this issue Dec 22, 2016 · 10 comments
Closed

Low signal level distorts MFCC/GFCC values #543

dbogdanov opened this issue Dec 22, 2016 · 10 comments
Assignees
Milestone

Comments

@dbogdanov
Copy link
Member

dbogdanov commented Dec 22, 2016

Ideally we would expect to have identical MFCC coefficients, except for the 0th coefficient, on different input levels for the same input signal frame. However, in the case when the input signal level is very low, the MFCC values get distorted.

Low signal level leads to small spectrum values. Using power spectrum for computation of mel bands reduces these values further. Taking log to compute log-energies we apply thresholding to truncate very silent bands (currently we truncate to -90dB).

For some signal frames it may occur that some bands truncated being below the threshold, while others are not. This lead to MFCC values different from expected.

When all bands values are truncated, the resulting MFCC vector contains zeros except for the 0th coefficient which will receive its minimum negative value. Avoiding distortion by lowering silence threshold comes at cost of more frames containing non-zero MFCC vectors. This threshold might depend on application.

Solutions:

  • disable truncation when computing dbamp/dbpow log in MFCC/GFCC.
  • lower silence threshold (-180dB) for MFCC/GFCC
  • truncate bands at -90dB if magnitude spectrum was used, at -180dB if power spectrum was used
  • implement silenceThreshold parameter
  • print a warning on every truncated frame
  • leave as it is, but note this issue in documentation (will affect some tasks)
@dbogdanov dbogdanov added this to the 2.1 milestone Dec 22, 2016
@dbogdanov
Copy link
Member Author

@georgid

@georgid
Copy link
Contributor

georgid commented Dec 23, 2016

Ideally we would expect to have identical MFCC coefficients, except for the 0th coefficient, on different input levels for the same input signal frame.

For some signal frames it may occur that some bands truncated being below the threshold, while others are not. This lead to MFCC values different from expected.

In MFCC literature normalization for different signal levels is handled by using the cepstral mean normalizarion. It is usually applied after taking log of the mel-scale filterbank and before DCT:

http://dsp.stackexchange.com/questions/19564/cepstral-mean-normalization
I think it is a good idea to implement that.

Also I think that taking db instead of log is a reason for having the values of spectral energies even lower. Unless somebody points me to a MFCC reference implementation, in which they use db, I suggest we stick to log.

@dbogdanov
Copy link
Member Author

We won't be able to implement this normalization into the MFCC algorithm. This is a very application specific post-processing step that can be done on Pool containing MFCC frames.

Log type does not affect inconsistencies in MFCC values. If we disable thresholding, any log will work equally well. Perhaps we can have an parameter that enables/disables threshold clipping of energy bands before taking log.

@dbogdanov
Copy link
Member Author

@edufonseca

@dbogdanov
Copy link
Member Author

We can't disable thresholding completely otherwise we'll get NaN values. Therefore the easiest solution will be:

  • lower hardcoded silence threshold in MFCC and GFCC (create new versions of lin2db, pow2db, amp2db with a threshold argument in essentiamath)
  • explain this issue in documentation

If one wants to avoid distorting MFCC values on low level signal, he will need to lower the threshold and/or post-process the resulting values. One can define a threshold in the 0th coefficient to filter out silent/distorted MFCC frames.

dbogdanov added a commit that referenced this issue Dec 28, 2016
Add silenceThreshold parameter default to 1e-9.

Get rid of pointer to compressor function because it is not flexible
as soon as one needs to run functions with different number of
parameters (amp2db vs linear) or different threshold log values (amb2db
db threshold vs log)
@dbogdanov
Copy link
Member Author

@georgid @pabloEntropia
I've done some changes in the new mfcc_thresholding branch. If you like the idea, the same should be done for GFCC.

@georgid
Copy link
Contributor

georgid commented Dec 29, 2016

ok, decreasing the thresholding makes sense. 1e-9 seems reasonable value for a threshold.

@dbogdanov
Copy link
Member Author

For me it is not clear which is the best default value. 1e-9 is a default value we had before. librosa is using 1e-10. Maybe @edufonseca can test a few different ones.

palonso pushed a commit to palonso/essentia that referenced this issue Jan 4, 2017
Add silenceThreshold parameter default to 1e-9.
Modified unit test to to fit the new results.
@edufonseca
Copy link

edufonseca commented Jan 4, 2017 via email

palonso pushed a commit to palonso/essentia that referenced this issue Jan 5, 2017
Add silenceThreshold parameter default to 1e-9.

Get rid of pointer to compressor function because it is not flexible as soon as one needs to run functions with different number of parameters (amp2db vs linear) or different threshold log values (amb2db db threshold vs log)

Update test to the new results of the `testZero`
dbogdanov added a commit that referenced this issue Jan 5, 2017
@dbogdanov
Copy link
Member Author

I can conclude we should set threshold to 1e-10 for better consistency with librosa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants