Skip to content

This repository contains the tensorflow implementation and models for DAN - CVPR 2017 paper

Notifications You must be signed in to change notification settings

peternara/m-DAN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dual attention network

This repository contains the code (using Tensorflow) and models for this CVPR 2017 paper (image-to-text and text-to-image task):

Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 
"Dual attention networks for multimodal reasoning and matching." 
in Proc. CVPR 2017

Thanks to instructions from the author (Hyeonseob Nam), I was able to reproduce the number reported in the paper on Flickr30k:

Image-to-Text Text-to-Image
Method R@1 R@5 R@10 MR R@1 R@5 R@10 MR
DAN Paper 55.0 81.8 89.0 1 39.4 69.2 79.1 2
This Implementation 54.4 82.4 89.9 1.0 39.8 71.4 80.9 2

Dependencies

  • Python 2.7; TensorFlow >= 1.4.0; tqdm and nltk (for preprocessing)
  • Flickr30k Images and Text
  • Dataset splits from here. This split is the same as m-RNN.
  • Pretrained Resnet-152 Model from Tensorpack

Training

  1. Get Resnet feature
$ python resnet-extractor/extract.py flickr30k_images/ ImageNet-ResNet152.npz resnet-152 --batch_size 20 --resize 448 --depth 152
  1. Preprocess
$ python prepro_flickr30k.py splits/ results_20130124.token prepro --noword2vec --noimgfeat
  1. Training

I use a slightly different training schedule. Batch size 256, learning rate 0.1 and 0.5 dropout for the first 60 epochs and 0.8 dropout and learning rate 0.05 for the next epochs. Also I use Adadelta as optimizer. It will take up to 9GB GPU memory and train for about 50 hours with SSDs.

(There are other options (--use_char, --concat, etc.) I haven't tried with hard negative mining yet.)

$ python main.py prepro models dan --no_wordvec --word_emb_size 512 --num_hops 2 --word_count_thres 1 --sent_size_thres 200 --word_size_thres 20 --hidden_size 512 --keep_prob 0.5 --margin 100 --num_epochs 60 --save_period 1000 --batch_size 256 --clip_gradient_norm 0.1 --init_lr 0.1 --wd 0.0005 --featpath resnet-152/ --feat_dim 14,14,2048 --hn_num 32 --is_train
  1. Testing with the model You can download my model and put it in models/00/dan/best/ to directly run it
$ python main.py prepro models dan --no_wordvec --word_emb_size 512 --num_hops 2 --word_count_thres 1 --sent_size_thres 200 --word_size_thres 20 --hidden_size 512 --keep_prob 0.5 --margin 100 --num_epochs 60 --save_period 1000 --batch_size 256 --clip_gradient_norm 0.1 --init_lr 0.1 --wd 0.0005 --featpath resnet-152/ --feat_dim 14,14,2048 --hn_num 32 --is_test --load_best

About

This repository contains the tensorflow implementation and models for DAN - CVPR 2017 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%