-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
working inference code for video model? #7
Comments
I will take a look asap. By "meaningless results" do you mean the generated text completely go off or it's not desirable enough? |
I see. I'm running experiments now. For the loss, there indeed is a difference between the implementation here and that tutorial colab you shared: While their labels include both questions and answers (the model will also learn to predict user's questions), ours only set labels to account for the answers (the model only learns to predict the responses). Not sure if this fully explains the loss differences though. |
Hi @Namzakku, I've put up a full example notebook here https://colab.research.google.com/drive/1ejXG58cpMXvkcsx2qqTFK2BqWBVBEr7Y?usp=sharing that showcases a training run and the following inference of llava-next-video-7b, with the data from ShareGPT4Video. The output of the finetuned model makes sense to me and indicates that the training worked (there is noticeable difference between the output of the original and the finetuned model). The exact running script is updated in the For the loss scale, notice that in the previous colab tutorial you shared it was using a gradient accumulation steps of 8, while in my |
Thanks for the great notebook! Also, I found a tiny problem when training. |
Glad to know it worked! For the number of frames, it's weird as I do have frame padding implemented so it should work with different number of frames (I also confirmed that for video that has less Anyway by any chance do you have the error message available so that I can confirm the reason for crashed training? Meanwhile I will do some testing locally to see if I can reproduce. |
sure! this is the error I got.
|
Thanks. I can reproduce, and it turned out that there is some dimension issue with the padding. Will push a fix soon. |
Hi!
I tried to combine the inference instruction you provided and follow the inference code from the hf tutorials in
https://colab.research.google.com/drive/1dTdro-k7NFqRgGq5-TlGHM-6k2sYQhXp#scrollTo=4ccbd183-f15a-4f94-a526-9ceeec3f61e0
but got meaningless results.
I also tried to use your collator but got
CUDA error: device-side assert triggered
in the generate().Can you provide a working code or give some hints?
The text was updated successfully, but these errors were encountered: