Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use the mm-cot frame as a utility library through local LLM? #73

Open
dszpr opened this issue Feb 5, 2024 · 1 comment
Open

Comments

@dszpr
Copy link

dszpr commented Feb 5, 2024

Hi! Much appreciated for the excellent work!

I am working on vision-QA task using BLIP2, which consists of three modules:
ViT that extracting vision feature
QFORMER that narrow the gap between vision and language modalities
T5xxl that receive the question and the output of QFORMER to generate answers.

I wonder if it's possible to employ the mm-cot as a utility library in BLIP2 model to enhance vision-QA inference?

@cooelf
Copy link
Contributor

cooelf commented May 19, 2024

Hi, thanks for your interest! An efficient way could be training your framework just in two steps like MM-CoT: (i) rationale generation; (ii) answer inference; no matter the backbone modules are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants