-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I3D Convolutions Script + Input Data #7
Comments
Hi, Sorry for the long reply. I decided to write a little library dedicated to feature extraction from videos. It is mainly based on the script I wrote for the MDVC but is more transparent and easier to use 🙂. So, it took a couple of days to wrap it up. Check it out: https://github.com/v-iashin/i3d_features. Here are the answers to your questions: (ii) Yep. Exactly! We downloaded the available videos using the official script activitynet/ActivityNet@7185a39 and run the feature extraction script over the raw videos. (iii) I still have the videos. I can think about a way how to share them in case you would REALLY like to have them 🙂. The notes on (i):
Another note is regarding how the I3D features were extracted for ActivityNet. Specifically, please see |
Hi Vladimir, (i) Thanks for putting together the I3D repository. Got me clarity on the process followed to get to the I3D features. Indeed very helpful. (ii) I see. When you ran the feature extraction script on the videos, did you store the sampled frames (using --keep_frames)? Sadly even the official 5 FPS link for videos on (http://activity-net.org/challenges/2020/tasks/anet_captioning.html) isn't accessible. I am currently blocked on making any progress in my work, so it is necessary to gain access. If you have access to the sampled frames and can upload them, I would really appreciate it. If you need a server to upload, I can arrange one for you. Thanks again, |
(ii) I am afraid I cannot provide you with frames as we didn't store them at all. The videos itself are 200+ GB, but the frames were 1+ TB. We didn't have a larger fast disk (SSD or NVMe) to read them on-fly. So, we decided to calculate features and remove the frames right away, just like in the repo now. I can upload the videos, and you can extract features along with frames. Let me know if you need the videos. The process of extraction takes around a week on this dataset for 24 step and 24 stack size on 3 2080Ti GPUs. We have the resources, I will organize the download link. Don't worry. |
Ok, it would be well appreciated if you can share the videos. Thanks! |
Please contact us via e-mail. |
Thank you! |
Hi Vladimir,
Noticed in the MDVC codebase that you load the I3D CONV features from "./data/sub_activitynet_v1-3.i3d_25fps_stack24step24_2stream.hdf5"
Some questions:
(i) Do you have a script that generates these features from raw data?
(ii) What input data did you run the I3D model over? I ask because it appears from your I3D features filename that your features are for 25 FPS, which implies that you manually sampled the videos in the ActivityNet Captions dataset at 25 FPS since unfortunately, the official ActivityNet website only offers sampled frames at 5 FPS (http://activity-net.org/challenges/2020/tasks/anet_captioning.html).
(iii) Do you have a link for the sampled frames?
Thanks!
Aman
The text was updated successfully, but these errors were encountered: