You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on vision-QA task using BLIP2, which consists of three modules:
ViT that extracting vision feature
QFORMER that narrow the gap between vision and language modalities
T5xxl that receive the question and the output of QFORMER to generate answers.
I wonder if it's possible to employ the mm-cot as a utility library in BLIP2 model to enhance vision-QA inference?
The text was updated successfully, but these errors were encountered:
Hi, thanks for your interest! An efficient way could be training your framework just in two steps like MM-CoT: (i) rationale generation; (ii) answer inference; no matter the backbone modules are.
Hi! Much appreciated for the excellent work!
I am working on vision-QA task using BLIP2, which consists of three modules:
ViT that extracting vision feature
QFORMER that narrow the gap between vision and language modalities
T5xxl that receive the question and the output of QFORMER to generate answers.
I wonder if it's possible to employ the mm-cot as a utility library in BLIP2 model to enhance vision-QA inference?
The text was updated successfully, but these errors were encountered: