Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird results and "Error reading frame:" #402

Closed
danieleghisi opened this issue Apr 29, 2016 · 7 comments
Closed

Weird results and "Error reading frame:" #402

danieleghisi opened this issue Apr 29, 2016 · 7 comments
Milestone

Comments

@danieleghisi
Copy link

Hi, I've downloaded the streaming_extractor_music command line tool, but I'm having a hard time to make it work properly on a sequence of short mp3 files (from half a second, to 30 seconds, more or less)

  1. On one side I'm constantly having warning when processing. This is a standard output:

Process step: Read metadata
Process step: Compute md5 audio hash and codec
[ WARNING ] AudioLoader: Error reading frame: Input/output error
Process step: Replay gain
[ WARNING ] AudioLoader: Error reading frame: Input/output error
Process step: Compute audio features
[ WARNING ] AudioLoader: Error reading frame: Input/output error
[ WARNING ] AudioLoader: Error reading frame: Input/output error
Process step: Compute aggregation
All done

What do they mean? What am I doing wrong?

  1. Results seems to me often very off. For instance in the attached examples, the a.mp3 file has a LARGER spectral_rms, whereas the b.mp3 has a smaller one. But b.mp3 seems to have way more energy! (Or, alternatively, I didn't understand the descriptors at all...)
    (This is the case for some other descriptors - e.g. the danceability descriptor, which by the way was exactly what I was looking for, seems to give extremely inconsistent results to my hearing)

Am I doing something wrong?
Do you have any advice to fix these issues?

Thanks a lot for all your work,
Daniele Ghisi

examples.zip

@dbogdanov
Copy link
Member

Error reading frame is dropped here. This happens when the av_read_frame method finds corrupt/incorrect packets in the audio stream. I don't know exactly what "Input/output error" means but it is produced by ffmpeg in this method.

I have tried your files on my machine and did not get any reading errors (master branch, ffmpeg 2.7.6 on ubuntu). However, this type of errors occur occasionally and is kind of normal, and you can ignore them.

The music extractor you are using is applying loudness normalization based on computed replay gain value:

  • a: replay gain 14.4678974152,
  • b: replay gain -1.5447101593

Therefore you get spectral RMS values that may differ from what you expect. Perhaps you would like to analyze audio without ReplayGain for your problem. If that is what you want, there's no configurable option to do that with this extractor, but you can fix that in the code.

We would like to do some further QA for danceability and expand our unit tests with audio examples manually ranked by their danceability. Your input with examples of inconsistency is welcome.

Btw, we also have another model for danceablity based on a classifier that is used in the context of AcousticBrainz.

@danieleghisi
Copy link
Author

Hi, and thanks a lot for your quick answer.
If I understand correctly, the replay gain is hence a sort of decibel level of the whole file (with respect to a reference -31 db level), and everything is renormalized according to it.
I was hoping to get by without coding/compiling, but I understand that this is not an option :)

I'll send you inconsistency examples as soon as I am able to isolate them properly...
Thanks again,
Daniele

@danieleghisi
Copy link
Author

danieleghisi commented Apr 30, 2016

Hi dbogdanov,

here you find attached a pretty clear inconsistency on danceability classifier. 009743.mp3 has a danceability of 3.23 (but it's not danceable at all), while 011915.mp3 has a danceability of 0.832065165043.mp3, while it's far more danceable :)

I've not really quite understood which are the boundaries (if any) for the danceability...
Doc says they "usually range from 0 to 3". But i have negative values and values > 3.
Is there some standard range?

And thanks again for your work, massively helpful!
Daniele
Archive.zip

@danieleghisi
Copy link
Author

I'm also having troubles coping with beatsloudness (I thought it could be a "second choice" feature for danceability): issue very similar to danceability ones.
Am I interpreting or doing something wrong? Perhaps my samples are too short?
Thanks,
Daniele

@dbogdanov
Copy link
Member

There should be no negative values. If you have an example, please, send it and I will review the computation. For further details on Danceability algorithm I suggest you reading this paper. In the paper maximum danceability corresponded to value 0, but in the implementation it is rescaled to 1/0.

@danieleghisi
Copy link
Author

Hi, Here's a file giving negative valued danceability.
004077.mp3.zip

@dbogdanov dbogdanov added this to the 2.1 milestone Oct 3, 2016
@dbogdanov
Copy link
Member

Closing this issue, as I've created a new one related to danceability output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants