I3D Convolutions Script + Input Data #7

amanchadha · 2020-05-31T07:04:15Z

Hi Vladimir,

Noticed in the MDVC codebase that you load the I3D CONV features from "./data/sub_activitynet_v1-3.i3d_25fps_stack24step24_2stream.hdf5"

Some questions:
(i) Do you have a script that generates these features from raw data?
(ii) What input data did you run the I3D model over? I ask because it appears from your I3D features filename that your features are for 25 FPS, which implies that you manually sampled the videos in the ActivityNet Captions dataset at 25 FPS since unfortunately, the official ActivityNet website only offers sampled frames at 5 FPS (http://activity-net.org/challenges/2020/tasks/anet_captioning.html).
(iii) Do you have a link for the sampled frames?

Thanks!
Aman

v-iashin · 2020-06-01T19:07:49Z

Hi,

Sorry for the long reply. I decided to write a little library dedicated to feature extraction from videos. It is mainly based on the script I wrote for the MDVC but is more transparent and easier to use 🙂. So, it took a couple of days to wrap it up. Check it out: https://github.com/v-iashin/i3d_features.

Here are the answers to your questions:
(i) Yes. Check out v-iashin/video_features@4fa02bd5c. Please see the notes below.

(ii) Yep. Exactly! We downloaded the available videos using the official script activitynet/ActivityNet@7185a39 and run the feature extraction script over the raw videos.

(iii) I still have the videos. I can think about a way how to share them in case you would REALLY like to have them 🙂.

The notes on (i):
I was using an implementation of PWC Net from sniklaus/pytorch-pwc@f613890 with a couple of tweaks. Yesterday, I checked and figured out that the model weights have been changed (hashes: 91006e6cd54dc052b00660239f5b1814 -> 08330ee36a9aa0d16f198f8927352502). I am not sure what caused it, I haven't contacted the author. I tried both and there is a small difference between the values. So I provide the model, which I used for MDVC (network-default.pytorch) there as well as the weights from the latest model from sniklaus/pytorch-pwc (pwc_net.pt). Make sure to use the correct ones:

python main.py --feature_type i3d --device_ids 0 --extraction_fps 25 --stack_size 24 --step_size 24 --pwc_path ./models/i3d/checkpoints/network-default.pytorch --video_paths ./sample/v_ZNVhz7ctTq0.mp4
# this outputs the exact values as in "sub_activitynet_v1-3.i3d_25fps_stack24step24_2stream.hdf5" for this video

Another note is regarding how the I3D features were extracted for ActivityNet. Specifically, please see i3d.utils.utils.form_iter_list() function. It has phase argument. This argument was specified according to the epoch phase: train or val_1/val_2 and how the last video frames were used to form the last video feature in a video. Please, make sure to tweak the code of the feature extraction a bit. I think it should be pretty straight forward. I just wanted to keep it dataset-independent and decided to work around it somehow.

amanchadha · 2020-06-02T02:39:05Z

Hi Vladimir,

(i) Thanks for putting together the I3D repository. Got me clarity on the process followed to get to the I3D features. Indeed very helpful.

(ii) I see. When you ran the feature extraction script on the videos, did you store the sampled frames (using --keep_frames)? Sadly even the official 5 FPS link for videos on (http://activity-net.org/challenges/2020/tasks/anet_captioning.html) isn't accessible. I am currently blocked on making any progress in my work, so it is necessary to gain access. If you have access to the sampled frames and can upload them, I would really appreciate it. If you need a server to upload, I can arrange one for you.

Thanks again,
Aman

v-iashin · 2020-06-02T04:39:06Z

(ii) I am afraid I cannot provide you with frames as we didn't store them at all. The videos itself are 200+ GB, but the frames were 1+ TB. We didn't have a larger fast disk (SSD or NVMe) to read them on-fly. So, we decided to calculate features and remove the frames right away, just like in the repo now. I can upload the videos, and you can extract features along with frames. Let me know if you need the videos. The process of extraction takes around a week on this dataset for 24 step and 24 stack size on 3 2080Ti GPUs.

We have the resources, I will organize the download link. Don't worry.

amanchadha · 2020-06-02T07:29:06Z

Ok, it would be well appreciated if you can share the videos. Thanks!

v-iashin · 2020-06-02T07:55:31Z

Please contact us via e-mail.

amanchadha · 2020-06-04T02:35:12Z

Thank you!

v-iashin closed this as completed Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I3D Convolutions Script + Input Data #7

I3D Convolutions Script + Input Data #7

amanchadha commented May 31, 2020

v-iashin commented Jun 1, 2020 •

edited

Loading

amanchadha commented Jun 2, 2020

v-iashin commented Jun 2, 2020

amanchadha commented Jun 2, 2020

v-iashin commented Jun 2, 2020 •

edited

Loading

amanchadha commented Jun 4, 2020

I3D Convolutions Script + Input Data #7

I3D Convolutions Script + Input Data #7

Comments

amanchadha commented May 31, 2020

v-iashin commented Jun 1, 2020 • edited Loading

amanchadha commented Jun 2, 2020

v-iashin commented Jun 2, 2020

amanchadha commented Jun 2, 2020

v-iashin commented Jun 2, 2020 • edited Loading

amanchadha commented Jun 4, 2020

v-iashin commented Jun 1, 2020 •

edited

Loading

v-iashin commented Jun 2, 2020 •

edited

Loading