-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling rate of VGGish #85
Comments
Hi. We are on the same page here, I guess. 0.96s and the sampling rate corresponds to the official implementation. I have not tried it with different hyperparameters. So, my answer is: I don't know. What I would do to try it:
This is what my |
Closing this as it is not really an issue of this repo but feel free to ask more questions. |
A similar question: v-iashin/MDVC#27 |
Thanks for your quick reply. I will try to figure out if it's possible to make changes in VGGish :) |
Hi, I take a look at the source code of VGGish and find that it is possible to change the sampling rate of input audios. The default value of above two args is both I change the two args to my video sampling strategy and the output feature length is the same with my video feature length now. (almost, for there maybe some slight changes in values) I am not quite sure about this, but i would like to try in my own project now. Thank you for integrating VGGish in this repo again. ^ ^ |
For the question above, my answer is Yes |
Well, of course it is possible because you make the shapes to be the same. What I meant is the potential mismatch between the pre-processing during training and inference. I know that having smaller stacks of rgb frames in i3d than it was trained on will not completely mess up the predicted probabilities (one can look at the results with ‘show_pred’). Since I have not implemented it for vggish, i am not sure if the features will be useful. |
Oh, i know. I understand what you mean, it's hard to know unless check this. |
Hi, i am new to VGGish and i want to use it to extract audio features. The website says the feature tensor will be 128-d and correspond to 0.96s of the original video. Can i change this sampling rate to other numbers? I see I3D model provieds some arguments to do this(like
stack_size
andstep_size
), can i do this in VGGish?The text was updated successfully, but these errors were encountered: