Visual-Question-Answering

Example dataset

Trainset.json

{"qid": 1, "image_name": "synpic54610.jpg", "image_organ": "HEAD", "answer": "Yes", "answer_type": "CLOSED", "question_type": "PRES", "question": "Are regions of the brain infarcted?", "phrase_type": "freeform"}

VQA_RAD Image Folder contains the images in .jpg format, whose name are the qid as mentioned in the trainset.json.

Word Embeddings

Questions

A pretained model is used for word to vector conversion, Google News Vector for vectorizing all the questions.

Answers

atoi (ASCII to Integers) is used to first convert the word into integers, such as 0 for 'yes' 1 for 'no' and so on to get ready to fit into the model.
itoa (Integers to ASCII) is used to map each integers to its respective word.

Images

VGG16 pretrained on imagenet dataset is used for image preprocessing.

The VGG16 Architecture

Model Building

We use Dense network with tanh activation for preprocessed images.
The question layer is passed through LSTM.
After that both the vectors are concatenated and passed through dense layers, and final layer with softmax function.

The built model looks something like this:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
VQA_RAD Image Folder		VQA_RAD Image Folder
README.md		README.md
Visual_Question_Answering.ipynb		Visual_Question_Answering.ipynb
testset.json		testset.json
trainset.json		trainset.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual-Question-Answering

Word Embeddings

Model Building

Result

About

Releases

Packages

Languages

bbabina/Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Visual-Question-Answering

Word Embeddings

Model Building

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages