Cross-Modal and Hierarchical Modeling of Video and Text

The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch

Prerequisites

The following packages are required to run the scripts:

PyTorch >= 0.4 and torchvision
Package tensorboardX and NLTK
Dataset: please download features (The feature is ~~still uploading~~ uploaded.) and put them into the folder data/anet_precomp and data/didemo_precomp

Model Evaluation

The learned model on ActivityNet and DiDeMo can be found in this link. You can run train.py with option --resume and --eval_only to evaluate a given model, with options similar to the training scripts as below.

For a model with Inception feature on ActivityNet dataset at "./runs/release/activitynet/ICEP/hse_tau5e-4/run1/checkpoint.pth.tar", it can be evaluated by:

$ python train.py anet_precomp --feat_name icep --img_dim 2048 --resume ./runs/release/activitynet/ICEP/hse_tau5e-4/run1/checkpoint.pth.tar --eval_only

For a model with C3D feature on ActivityNet dataset at "./runs/release/activitynet/C3D/hse_tau5e-4/run1/checkpoint.pth.tar", it can be evaluated by:

$ python train.py anet_precomp --feat_name c3d --img_dim 500 --resume ./runs/release/activitynet/C3D/hse_tau5e-4/run1/checkpoint.pth.tar --eval_only

We presume the input model is a GPU stored model.

Model Training

To reproduce our experiments with HSE, please use train.py and follow the instructions below. To train HSE with \tau=5e-4, please with

$ --reconstruct_loss --lowest_reconstruct_loss

For example, to train HSE with \tau=5e-4 on ActivityNet with C3D feature:

$ python train.py anet_precomp --feat_name c3d --img_dim 500 --low_level_loss --reconstruct_loss --lowest_reconstruct_loss --norm

To train HSE with \tau=5e-4 on ActivityNet with Inception feature ~~(The feature is still being uploaded)~~:

$ python train.py anet_precomp --feat_name icep --img_dim 2048 --low_level_loss --reconstruct_loss --lowest_reconstruct_loss --norm

To train HSE with \tau=5e-4 on Didemo with Inception feature:

$ python train.py didemo_precomp --feat_name icep --img_dim 2048 --low_level_loss --reconstruct_loss --lowest_reconstruct_loss --norm

To train HSE with \tau=0 on ActivityNet with C3D feature:

$ python train.py anet_precomp --feat_name c3d --img_dim 500 --low_level_loss --norm

.bib citation

If this repo helps in your work, please cite the following paper:

@inproceedings{DBLP:conf/eccv/ZhangHS18,
  author    = {Bowen Zhang and
           Hexiang Hu and
           Fei Sha},
  title     = {Cross-Modal and Hierarchical Modeling of Video and Text},
  booktitle = {Computer Vision - {ECCV} 2018 - 15th European Conference, Munich,
           Germany, September 8-14, 2018, Proceedings, Part {XIII}},
  pages     = {385--401},
  year      = {2018}

}

Acknowledgment

We thank following repos providing helpful components/functions in our work.

VSE++ for the framework
TSN for the inception-v3 feature

Contacts

Please report bugs and errors to

Bowen Zhang: zbwglory [at] gmail.com
Hexiang Hu: hexiang.frank.hu [at] gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
activity_net		activity_net
data		data
decoder		decoder
didemo_dev		didemo_dev
vocab		vocab
.gitignore		.gitignore
README.md		README.md
analyze.py		analyze.py
anet_vocab.py		anet_vocab.py
didemo_vocab.py		didemo_vocab.py
evaluation.py		evaluation.py
gen_vocab.py		gen_vocab.py
layers.py		layers.py
loss.py		loss.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Modal and Hierarchical Modeling of Video and Text

Prerequisites

Model Evaluation

Model Training

.bib citation

Acknowledgment

Contacts

About

Releases

Packages

Contributors 2

Languages

Sha-Lab/CMHSE

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal and Hierarchical Modeling of Video and Text

Prerequisites

Model Evaluation

Model Training

.bib citation

Acknowledgment

Contacts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages