Gradient issue #13

TonyXuQAQ · 2023-09-15T07:45:00Z

Hi, after going through the training code, it seems that the gradient is not properly backpropagated. It seems that all projector layers mm_projectorare called within torch.no_grad (i.e., call_1, call_2). If so, it means the projector layer is not trained at all, right? Is this a typo in the released code or an error?

The text was updated successfully, but these errors were encountered:

RupertLuo · 2023-09-16T02:30:39Z

Can you share the error output and training configuration file?

TonyXuQAQ · 2023-09-16T07:07:41Z

There is no error. I just used the raw code of this repo. I mean, the projector mm_projector layer seems not been trained properly in valley/model/valley.py. All mm_projector are wrapped in torch.no_grad so that the projector will not be trained, since the gradient is blocked within torch.no_grad.

RupertLuo · 2023-09-21T02:23:24Z

In the file train.py, You can set whether need to update the projector.

TonyXuQAQ · 2023-09-21T02:52:56Z

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector.

feymanpriv · 2023-09-21T13:08:17Z

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector.

@TonyXuQAQ I find the projector is not wrapped inside 'torch.no_grad' in the original code of this repo, as follows:
in
[https://github.com/RupertLuo/Valley/blob/8da73a9551cd9ce520c47f7c3f508fdfc387f4f8/valley/model/valley.py].
I guess the "bug" is caused by reorganizing the codes. And the projector should be outside the 'torch.no_grad' as the released models are trained with tuning projector.

TonyXuQAQ · 2023-09-21T14:10:20Z

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

TonyXuQAQ · 2023-09-21T14:19:33Z

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

RupertLuo · 2023-09-21T14:44:39Z

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

LLaVA-instruct-150k should be able to load. For videochat-11k, you need to convert the format to LLaVA-instruct-150k.

RupertLuo · 2023-09-21T14:46:30Z

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

Thank you for your continued attention to this project. I will synchronize it to the code that can be perfectly trained as soon as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient issue #13

Gradient issue #13

TonyXuQAQ commented Sep 15, 2023

RupertLuo commented Sep 16, 2023

TonyXuQAQ commented Sep 16, 2023

RupertLuo commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

feymanpriv commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

RupertLuo commented Sep 21, 2023

RupertLuo commented Sep 21, 2023

Gradient issue #13

Gradient issue #13

Comments

TonyXuQAQ commented Sep 15, 2023

RupertLuo commented Sep 16, 2023

TonyXuQAQ commented Sep 16, 2023

RupertLuo commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

feymanpriv commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

TonyXuQAQ commented Sep 21, 2023

RupertLuo commented Sep 21, 2023

RupertLuo commented Sep 21, 2023