Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling rate of VGGish #85

Closed
yyccli opened this issue Oct 6, 2022 · 8 comments
Closed

Sampling rate of VGGish #85

yyccli opened this issue Oct 6, 2022 · 8 comments

Comments

@yyccli
Copy link

yyccli commented Oct 6, 2022

Hi, i am new to VGGish and i want to use it to extract audio features. The website says the feature tensor will be 128-d and correspond to 0.96s of the original video. Can i change this sampling rate to other numbers? I see I3D model provieds some arguments to do this(like stack_size and step_size), can i do this in VGGish?

@v-iashin
Copy link
Owner

v-iashin commented Oct 6, 2022

Hi. We are on the same page here, I guess.

0.96s and the sampling rate corresponds to the official implementation. I have not tried it with different hyperparameters.

So, my answer is: I don't know.

What I would do to try it:

  1. Append the classification head (+PCA as in the implementation) to the backbone
  2. And look at the outputs, e.g. if the predicted classes are meaningful or not.

This is what my show_pred argument is for. However, I didn't implement it for VGGish yet (you could do it, perhaps :) ).

@v-iashin
Copy link
Owner

v-iashin commented Oct 6, 2022

Closing this as it is not really an issue of this repo but feel free to ask more questions.

@v-iashin v-iashin closed this as completed Oct 6, 2022
@v-iashin
Copy link
Owner

v-iashin commented Oct 6, 2022

A similar question: v-iashin/MDVC#27

@yyccli
Copy link
Author

yyccli commented Oct 7, 2022

Thanks for your quick reply. I will try to figure out if it's possible to make changes in VGGish :)

@yyccli
Copy link
Author

yyccli commented Oct 7, 2022

Hi, I take a look at the source code of VGGish and find that it is possible to change the sampling rate of input audios.
It seems that we just need to modify the args EXAMPLE_WINDOW_SECONDS and EXAMPLE_HOP_SECONDS in vggish_params.py and the length of the output feature vector changes.

The default value of above two args is both 0.96, so the sampling strategy is 1 feature vector <---> 0.96s of the input and feature vectors do not overlap.

I change the two args to my video sampling strategy and the output feature length is the same with my video feature length now. (almost, for there maybe some slight changes in values)

I am not quite sure about this, but i would like to try in my own project now.

Thank you for integrating VGGish in this repo again. ^ ^

@yyccli
Copy link
Author

yyccli commented Oct 7, 2022

For the question above, my answer is Yes

@v-iashin
Copy link
Owner

v-iashin commented Oct 7, 2022

Well, of course it is possible because you make the shapes to be the same. What I meant is the potential mismatch between the pre-processing during training and inference.

I know that having smaller stacks of rgb frames in i3d than it was trained on will not completely mess up the predicted probabilities (one can look at the results with ‘show_pred’). Since I have not implemented it for vggish, i am not sure if the features will be useful.

@yyccli
Copy link
Author

yyccli commented Oct 7, 2022

Oh, i know. I understand what you mean, it's hard to know unless check this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants