Sampling rate of VGGish #85

yyccli · 2022-10-06T11:40:02Z

Hi, i am new to VGGish and i want to use it to extract audio features. The website says the feature tensor will be 128-d and correspond to 0.96s of the original video. Can i change this sampling rate to other numbers? I see I3D model provieds some arguments to do this(like stack_size and step_size), can i do this in VGGish?

The text was updated successfully, but these errors were encountered:

v-iashin · 2022-10-06T11:50:02Z

Hi. We are on the same page here, I guess.

0.96s and the sampling rate corresponds to the official implementation. I have not tried it with different hyperparameters.

So, my answer is: I don't know.

What I would do to try it:

Append the classification head (+PCA as in the implementation) to the backbone
And look at the outputs, e.g. if the predicted classes are meaningful or not.

This is what my show_pred argument is for. However, I didn't implement it for VGGish yet (you could do it, perhaps :) ).

v-iashin · 2022-10-06T11:50:42Z

Closing this as it is not really an issue of this repo but feel free to ask more questions.

v-iashin · 2022-10-06T11:52:45Z

A similar question: v-iashin/MDVC#27

yyccli · 2022-10-07T06:40:10Z

Thanks for your quick reply. I will try to figure out if it's possible to make changes in VGGish :)

yyccli · 2022-10-07T08:58:04Z

Hi, I take a look at the source code of VGGish and find that it is possible to change the sampling rate of input audios.
It seems that we just need to modify the args EXAMPLE_WINDOW_SECONDS and EXAMPLE_HOP_SECONDS in vggish_params.py and the length of the output feature vector changes.

The default value of above two args is both 0.96, so the sampling strategy is 1 feature vector <---> 0.96s of the input and feature vectors do not overlap.

I change the two args to my video sampling strategy and the output feature length is the same with my video feature length now. (almost, for there maybe some slight changes in values)

I am not quite sure about this, but i would like to try in my own project now.

Thank you for integrating VGGish in this repo again. ^ ^

yyccli · 2022-10-07T09:02:31Z

For the question above, my answer is Yes

v-iashin · 2022-10-07T09:15:29Z

Well, of course it is possible because you make the shapes to be the same. What I meant is the potential mismatch between the pre-processing during training and inference.

I know that having smaller stacks of rgb frames in i3d than it was trained on will not completely mess up the predicted probabilities (one can look at the results with ‘show_pred’). Since I have not implemented it for vggish, i am not sure if the features will be useful.

yyccli · 2022-10-07T10:46:48Z

Oh, i know. I understand what you mean, it's hard to know unless check this.

v-iashin closed this as completed Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling rate of VGGish #85

Sampling rate of VGGish #85

yyccli commented Oct 6, 2022

v-iashin commented Oct 6, 2022

v-iashin commented Oct 6, 2022

v-iashin commented Oct 6, 2022

yyccli commented Oct 7, 2022

yyccli commented Oct 7, 2022

yyccli commented Oct 7, 2022

v-iashin commented Oct 7, 2022

yyccli commented Oct 7, 2022

Sampling rate of VGGish #85

Sampling rate of VGGish #85

Comments

yyccli commented Oct 6, 2022

v-iashin commented Oct 6, 2022

v-iashin commented Oct 6, 2022

v-iashin commented Oct 6, 2022

yyccli commented Oct 7, 2022

yyccli commented Oct 7, 2022

yyccli commented Oct 7, 2022

v-iashin commented Oct 7, 2022

yyccli commented Oct 7, 2022