-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient issue #13
Comments
Can you share the error output and training configuration file? |
There is no error. I just used the raw code of this repo. I mean, the projector |
@TonyXuQAQ I find the projector is not wrapped inside 'torch.no_grad' in the original code of this repo, as follows: |
Thanks for the information. During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code. I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging. |
So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints |
LLaVA-instruct-150k should be able to load. For videochat-11k, you need to convert the format to LLaVA-instruct-150k. |
Thank you for your continued attention to this project. I will synchronize it to the code that can be perfectly trained as soon as possible. |
Hi, after going through the training code, it seems that the gradient is not properly backpropagated. It seems that all projector layers
mm_projector
are called withintorch.no_grad
(i.e., call_1, call_2). If so, it means the projector layer is not trained at all, right? Is this a typo in the released code or an error?The text was updated successfully, but these errors were encountered: