Example dataset
Trainset.json
{"qid": 1, "image_name": "synpic54610.jpg", "image_organ": "HEAD", "answer": "Yes", "answer_type": "CLOSED", "question_type": "PRES", "question": "Are regions of the brain infarcted?", "phrase_type": "freeform"}
VQA_RAD Image Folder contains the images in .jpg format, whose name are the qid as mentioned in the trainset.json.
Questions
A pretained model is used for word to vector conversion, Google News Vector for vectorizing all the questions.
Answers
-
atoi (ASCII to Integers) is used to first convert the word into integers, such as 0 for 'yes' 1 for 'no' and so on to get ready to fit into the model.
-
itoa (Integers to ASCII) is used to map each integers to its respective word.
Images
VGG16 pretrained on imagenet dataset is used for image preprocessing.
The VGG16 Architecture
-
We use Dense network with tanh activation for preprocessed images.
-
The question layer is passed through LSTM.
-
After that both the vectors are concatenated and passed through dense layers, and final layer with softmax function.
The built model looks something like this: