MULQA: Adapting Multimodal Models to Unimodal Tasks by Ensembling FLAVA with ALBERT

CSCI 566 Project: Deep Learning and its Applications

Team members:

1.Akash Gujju 2.Anushka Kamath 3.Trisha Mandal 4.Varsha Kini

Abstract:

This research paper, presents an innovative study on enhancing the capabilities of multimodal models in performing unimodal tasks. By integrating FLAVA, a foundational language and vision alignment model, with ALBERT, a lite version of BERT focused on efficient language understanding, the research aims to explore the potential of these combined models in tasks that require understanding either text or vision solely, rather than both. The paper extensively compares traditional FLAVA and the ensembled MULQA model across various datasets, demonstrating that the adapted model can significantly improve performance in language-only and vision-only tasks. This adaptation not only suggests a promising direction for future research in multimodal learning but also contributes to the understanding of how such models can be optimized for specific unimodal applications. The experiments cover a range of datasets, from TextVQA and CommonsenseQA to image classification datasets like Fashion MNIST and SVHN, showcasing the model's versatility.

FLAVA baseline model - https://github.com/facebookresearch/multimodal/tree/main/examples/flava

Project Report - https://github.com/Varsha-Kini/MULQA--Adapting-Multimodal-Models-to-Unimodal-Tasks-by-Ensembling-FLAVA-with-ALBERT/blob/bb77ad367d6732244cfd4da7bdf4edda313da72f/FinalReport.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Screenshots or log files		Screenshots or log files
vocabs		vocabs
.gitattributes		.gitattributes
FinalReport.pdf		FinalReport.pdf
README.md		README.md
commonsense_qa.py		commonsense_qa.py
desktop.ini		desktop.ini
glue.py		glue.py
pubmed_qa_model.py		pubmed_qa_model.py
social_i_qa_model (1).py		social_i_qa_model (1).py
textvqa.ipynb		textvqa.ipynb
wikiqa.py		wikiqa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MULQA: Adapting Multimodal Models to Unimodal Tasks by Ensembling FLAVA with ALBERT

CSCI 566 Project: Deep Learning and its Applications

Team members:

Abstract:

About

Releases

Packages

Languages

Varsha-Kini/MULQA--Adapting-Multimodal-Models-to-Unimodal-Tasks-by-Ensembling-FLAVA-with-ALBERT

Folders and files

Latest commit

History

Repository files navigation

MULQA: Adapting Multimodal Models to Unimodal Tasks by Ensembling FLAVA with ALBERT

CSCI 566 Project: Deep Learning and its Applications

Team members:

Abstract:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages