Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[model]Understanding video with images as in-context #276

Open
kassy11 opened this issue Sep 18, 2023 · 1 comment
Open

[model]Understanding video with images as in-context #276

kassy11 opened this issue Sep 18, 2023 · 1 comment
Labels
area:model code of model

Comments

@kassy11
Copy link

kassy11 commented Sep 18, 2023

I want to give some images to the model as an in-cotext, then input the video and ask questions about the video content.
(Specifically, I would like to teach the model the type of dogs as images and then have the model count the number of dogs in the video.)
multimodal

The Otter-image model can be given an image as context, but no video can be input.
And, the Otter-video model cannot be given an image as context, but video can be input.

Is there an optimal implementation method or model for this type of situation?

@king159 king159 added the area:model code of model label Sep 25, 2023
@hcwei13
Copy link

hcwei13 commented Oct 26, 2023

I have the same needs!!! Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:model code of model
Projects
None yet
Development

No branches or pull requests

3 participants