Since we spend more than a half of time on Data Preprocessing, we provide our experience in processing data and hope it helps.

Dataset Preparation

We utilize three datasets for pretrain: WebVid2.5M, HowTo100M, YT-Temporal180M.

For downstream datasets, we use 11 datasets in total: TGIF, MSVD, MSRVTT, TVQA, Ego4D, ActivityNet, LSMDC, DiDeMo, Kinetics, HMDB51, VCR.

We train models on these raw videos directly, instead of using off-line extracted feature. We do not distribute datasets because of the license issue. Please download these datasets by yourself with following instructions:

1. Download Annotations

For simple, we arrange the annotation file from 15 datasets. Include: activitynet, cc3m, didemo, ego4d, hmdb51, howto100m, k400, lsmdc, msrvtt, msvd, tgif, tvqa, vcr1annots, webvid, yttemporal.

Please download annotation files arranged by us from Google Driver.

Download Data Preprocess Scripts (Optional)

For easier data processing, we also provide some scripts to process common datasets or annotations. Download from Google Driver.

2. Download Pretrain Dataset

All these datasets contain video mainly from YouTube, please install

pip install youtube-dl

Youtube-dl is slow recently, you may use yt-dlp as instead.

Download Dataset Scripts

We provide scripts for download WebVid, CC3M and YT-Temporal in Google Driver.

WebVid [5T]

Download results_2M_train.csv and results_2M_val.csv from webvid, and then use provided scripts to download webvid.

HowTo100M [~20T]

Download source video from howto100m.

YT-Temporal 180M [60T]

Please email to rowanz to access this dataset and then download with our provided scripts.

Notice that this dataset don't provide clean ASR. Please follow merlot to clean the ASR text, we have implement this in yttemporal.py.

3. Download Finetune Dataset

TGIF-QA [134G]

Download raw frames from Google Driver.

MSRVTT [<10G]

wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip -P data; unzip data/MSRVTT.zip

Please refer to Frozen for more details if you have difficult in downloading this dataset.

MSVD [1.86G]

wget -c https://www.cs.utexas.edu/users/ml/clamp/videoDescription/YouTubeClips.tar

Please refer to MSVD for more details if you have difficult in downloading this dataset.

K400 [260G]

Download csv file from here. Then download source video the same as Webvid.

HMDB51 [<10G]

Download source video from here.

Ego4d [900G]

LSMDC

Download source video from Website.

ActivityNet [200G]

Download source video from google driver.

DiDeMo

Please refer to clipbert for details.

4. Soft Link and Meta Data

After downloading all these datasets, please prepare these datasets as follow:

Add soft link

Add soft link below the directory dataset

mkdir dataset
ln -s [path_to_original_dataset] dataset/[lowercase_short_name]

As shown in below:

Meta data

mkdir metadata

Place all annotation file download from google driver in metadata, as shown in below:

The example of Webvid is as below:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Dataset Preparation

1. Download Annotations

Download Data Preprocess Scripts (Optional)

2. Download Pretrain Dataset

Download Dataset Scripts

WebVid [5T]

HowTo100M [~20T]

YT-Temporal 180M [60T]

3. Download Finetune Dataset

TGIF-QA [134G]

MSRVTT [<10G]

MSVD [1.86G]

K400 [260G]

HMDB51 [<10G]

Ego4d [900G]

LSMDC

ActivityNet [200G]

DiDeMo

4. Soft Link and Meta Data

Add soft link

Meta data

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Dataset Preparation

1. Download Annotations

Download Data Preprocess Scripts (Optional)

2. Download Pretrain Dataset

Download Dataset Scripts

WebVid [5T]

HowTo100M [~20T]

YT-Temporal 180M [60T]

3. Download Finetune Dataset

TGIF-QA [134G]

MSRVTT [<10G]

MSVD [1.86G]

K400 [260G]

HMDB51 [<10G]

Ego4d [900G]

LSMDC

ActivityNet [200G]

DiDeMo

4. Soft Link and Meta Data

Add soft link

Meta data