You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to give some images to the model as an in-cotext, then input the video and ask questions about the video content.
(Specifically, I would like to teach the model the type of dogs as images and then have the model count the number of dogs in the video.)
The Otter-image model can be given an image as context, but no video can be input.
And, the Otter-video model cannot be given an image as context, but video can be input.
Is there an optimal implementation method or model for this type of situation?
The text was updated successfully, but these errors were encountered:
I want to give some images to the model as an in-cotext, then input the video and ask questions about the video content.
(Specifically, I would like to teach the model the type of dogs as images and then have the model count the number of dogs in the video.)
The Otter-image model can be given an image as context, but no video can be input.
And, the Otter-video model cannot be given an image as context, but video can be input.
Is there an optimal implementation method or model for this type of situation?
The text was updated successfully, but these errors were encountered: