Models in Pytorch for visual reasoning task on Clevr dataset.
Stack attention:
https://arxiv.org/pdf/1511.02274.pdf
Module network:
https://arxiv.org/pdf/1705.03633.pdf
Yes, but what's new?
Try to archive same performance in end-to-end differentiable architecture:
Module memory network [new]
Module memory network end2end differentiable [new]
Try to archive weak supervision:
(Work in progress)
mkdir data
wget https://s3-us-west-1.amazonaws.com/clevr/CLEVR_v1.0.zip -O data/CLEVR_v1.0.zip
unzip data/CLEVR_v1.0.zip -d data
python scripts/extract_features.py \
--input_image_dir data/CLEVR_v1.0/images/train \
--output_h5_file data/train_features.h5
python scripts/preprocess_questions.py \
--input_questions_json data/CLEVR_v1.0/questions/CLEVR_train_questions.json \
--output_h5_file data/train_questions.h5 \
--output_vocab_json data/vocab.json
python train.py [-args]
arguments:
--model Model to train: SAN, SAN_wbw, PG, PG_memory, PG_endtoend
--question_size Number of words in question dictionary
--stem-dim Number of feature-maps
--n-channel Number of features channels
--batch_size Mini-batch dim
--min_grad Minimum value of gradient clipping
--max_grad Maximum value of gradient clipping
--load_model_path Load pre-trained model (path)
--load_model_mode Load model mode: Execution engine (EE), Program Generator (PG), Both (PG+EE)
--save_model Save model ? (bool)
--clevr_dataset Clevr dataset data (path)
--clevr_val_images Clevr dataset validation images (path)
--num_iterations Num iteration per epoch
--num_val_samples Number validation samples
--batch_multiplier Virtual batch (minimum value: 1)
--train_mode Train mode: Execution engine (EE), Program Generator (PG), Both (PG+EE)
--decoder_mode Progam generator mode: Backpropagation (soft, gumbel) Reinforce (hard, hard+penalty)
--use_curriculum Use curriculum to train program generator (bool)
Module memory network (Pg_memory)
Module memory network end2end (Pg_endtoend)