Since we spend more than a half of time on Data Preprocessing, we provide our experience in processing data and hope it helps.
We utilize three datasets for pretrain: WebVid2.5M, HowTo100M, YT-Temporal180M.
For downstream datasets, we use 11 datasets in total: TGIF, MSVD, MSRVTT, TVQA, Ego4D, ActivityNet, LSMDC, DiDeMo, Kinetics, HMDB51, VCR.
We train models on these raw videos directly, instead of using off-line extracted feature. We do not distribute datasets because of the license issue. Please download these datasets by yourself with following instructions:
For simple, we arrange the annotation file from 15 datasets. Include: activitynet, cc3m, didemo, ego4d, hmdb51, howto100m, k400, lsmdc, msrvtt, msvd, tgif, tvqa, vcr1annots, webvid, yttemporal.
Please download annotation files arranged by us from Google Driver.
For easier data processing, we also provide some scripts to process common datasets or annotations. Download from Google Driver.
All these datasets contain video mainly from YouTube, please install
pip install youtube-dl
Youtube-dl is slow recently, you may use yt-dlp as instead.
We provide scripts for download WebVid, CC3M and YT-Temporal in Google Driver.
Download results_2M_train.csv and results_2M_val.csv from webvid, and then use provided scripts to download webvid.
Download source video from howto100m.
Please email to rowanz to access this dataset and then download with our provided scripts.
Notice that this dataset don't provide clean ASR. Please follow merlot to clean the ASR text, we have implement this in yttemporal.py.
Download raw frames from Google Driver.
wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip -P data; unzip data/MSRVTT.zip
Please refer to Frozen for more details if you have difficult in downloading this dataset.
wget -c https://www.cs.utexas.edu/users/ml/clamp/videoDescription/YouTubeClips.tar
Please refer to MSVD for more details if you have difficult in downloading this dataset.
Download csv file from here. Then download source video the same as Webvid.
Download source video from here.
Download source video from Website.
Download source video from google driver.
Please refer to clipbert for details.
After downloading all these datasets, please prepare these datasets as follow:
Add soft link below the directory dataset
mkdir dataset
ln -s [path_to_original_dataset] dataset/[lowercase_short_name]
As shown in below:
mkdir metadata
Place all annotation file download from google driver in metadata, as shown in below: