Installment

Environment

git clone https://github.com/showlab/UniVTG
cd UniVTG

conda create --name univtg python=3.8
pip install -r requirements.txt

Datasets

An engineering contribution is that we unify most video temporal tasks by the same features, which makes pre-training or cross-training flexible.

Download the features and metadata for pertaining and downstream datasets. (skip pretraining if not needed)

Dataset	Task	Metadata	Video (Slowfast R50)	Video (CLIP B/32)	Text (CLIP B/32)
Point (Ego4D)	PT	548 MB	27.1 GB	5.7 GB	30.7 GB
Interval (VideoCC)	PT	155 MB	300 GB	62.5 GB	12.6 GB
Curve (VideoCC)	PT	3.8GB	👆	👆	132 MB
QVHighlights	MR + HL	5 MB	4.0 GB	940 MB	172 MB
Charades-STA	MR	4 MB	1.3 GB	305 MB	178 MB
NLQ	MR	3 MB	1.8 GB	404 MB	184 MB
TACoS	MR	2 MB	81 MB	18 MB	244 MB
YoutubeHL	HL	1 MB	427 MB	95 MB	2 MB
TVSum	HL	1 MB	28 MB	6 MB	1 MB
QFVS	VS	1MB	455 MB	👈	1MB
ActivityNet (optional)	MR	10 MB	4.5 GB	1.0 GB	958 MB
DiDeMo (optional)	MR	6 MB	1.1 GB	269 MB	443 MB
HACS (optional)	MR	15 MB	13.1 GB	3.0 GB	177 MB
COIN (optional)	MR	8 MB	2.3 GB	556 MB	30 MB

Unzip the downloaded tar by

tar -xvf {tar_name}.tar
mv data/home/qinghonglin/univtg/data/{dset_name}/* .  # Replace dset_name accordingly

For VideoCC Slowfast features, first group multiple sub-zips into the same one, then unzip it.

gunzip vid_slowfast_*.gz
cat vid_slowfast_* > vid_slowfast.tar

Organize the data / features in the following structure

univtg
├── eval
├── data
│   ├── qfvs
│   ├── tvsum
│   ├── youtube
│   ├── tacos
│   ├── ego4d
│   ├── charades
│   │   ├── metadata
│   │   │   ├──charades_test.jsonl
│   │   │   └──charades_train.jsonl
│   │   ├── txt_clip
│   │   ├── vid_clip
│   │   └── vid_slowfast
│   └── qvhighlights
│       ├── metadata
│       │   ├──qvhighlights_test.jsonl
│       │   ├──qvhighlights_train.jsonl
│       │   └──qvhighlights_val.jsonl
│       ├── txt_clip
│       ├── vid_clip
│       └── vid_slowfast
├── main
├── model
├── utils
├── README.md
└── ···

(Optional) We extract video features (Slowfast R/50 and CLIP B/32) based on this repo: HERO_Video_Feature_Extractor, you can use it extract other benchmarks or videos; We extract text features (CLIP B/32) by run_on_video/text_extractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

install.md

install.md

Installment

Environment

Datasets

Files

install.md

Latest commit

History

install.md

File metadata and controls

Installment

Environment

Datasets