Strong baseline for viusal question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in Pytorch based on Cyanogenoid's code.

1. File Explanation (Changes from Cyanogenoid's or what is new)

preprocess-images.py: Change deprecated usage and rebuilt custom ResNet152 Loader without 'Pycaffe'
model.py : Despite Cyanogenoid focused on upgrading model's performance, I rebuilt model.py following the paper.
train.py : Now tensorboard is available, so tracker for loss and accuracy is unneeded. Also, use gradient clipping to prevent result from gradient exploding.

2. How to run

install CoCo Datasets and set config.py's json and image file routes.
preprocess images with 'python preprocess-images.py' command
preprocess vocabulary (questions, answers) with 'python preprocss-vocab.py' command
Run training and evaluating steps with 'python train.py' command

3. Accuracy and Train loss

With 5 epochs, (1 epoch is about 2000 iters)

4. Sample Results

There is no merit for using my own testsets, I picked sample results with evaluation testsets.

left-above : question 1,2,3 left-below : question 4

Questions

Is the food napping on the table?
What has been to make lights?
What is the table made of?
Is this an Spanish town?

Right Answers

no
tea kettle
wood
no

Predicted

yes
flowers
wood
yes

left-above : question 1,2 left-below : question 3,4

Questions

What is in the top right corner?
Are there shadows on the sidewalk?
What is leaning against the house?
Is it cold outside?

Right Answers

tree
yes
ladder
yes

Predicted

clock
yes
fire hydrant
yes

left-above : question 1,2 left-below : question 3,4

Questions

Is there a bicycle in this picture?
How many windows can you see?
Is the person feeding the birds?
Is this in a park?

Right Answers

yes
1
no
yes

Predicted

yes
3
no
yes

5.Result

I trained my model for 24 hours, 5 epochs

It shows good performance for 'yes/no' type questions
But, when the question becomes subjective (choose between 3000 candidates), it shows lower performance

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
logs		logs
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
data.py		data.py
model.py		model.py
preprocess-images.py		preprocess-images.py
preprocess-vocab.py		preprocess-vocab.py
test.py		test.py
train.py		train.py
utils.py		utils.py
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strong baseline for viusal question answering

1. File Explanation (Changes from Cyanogenoid's or what is new)

2. How to run

3. Accuracy and Train loss

4. Sample Results

Questions

Right Answers

Predicted

Questions

Right Answers

Predicted

Questions

Right Answers

Predicted

5.Result

About

Releases

Packages

Languages

gjrb0324/VQA_Baseline_Pytorch

Folders and files

Latest commit

History

Repository files navigation

Strong baseline for viusal question answering

1. File Explanation (Changes from Cyanogenoid's or what is new)

2. How to run

3. Accuracy and Train loss

4. Sample Results

Questions

Right Answers

Predicted

Questions

Right Answers

Predicted

Questions

Right Answers

Predicted

5.Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages