Code bugs #18

itongggg · 2024-11-10T01:01:47Z

In your relora.py I found that for every relora layer, the B matrix is initialized as a zero matrix, it is same as standard setting,
however, i also found

when you wrap a model as a relora model, the matrix A is also initialized as a zero matrix, is it a typo ?

ShuDun23 · 2024-11-11T09:30:52Z

It seems they want the wrapped model to be exactly the same as the original one if keep_original_weights, otherwise lora_A.weight is initialized as kaiming in ReLoRaLinear, but even so, B times A is still zero. So it seems to me not a typo but a redundancy?

itongggg · 2024-11-15T00:54:09Z

It seems they want the wrapped model to be exactly the same as the original one if keep_original_weights, otherwise lora_A.weight is initialized as kaiming in ReLoRaLinear, but even so, B times A is still zero. So it seems to me not a typo but a redundancy?

but if A and B both are initialized with zero weights, the training process are stuck? since the gradient of A euqals to B^T\frac{\partial L}{\partial W} and the gradient of B euqals to \frac{\partial L}{\partial W}A^T , in this case your gradients for A and B would be zero all time.

ShuDun23 · 2024-11-15T02:27:51Z

Oh, even though both A and B are zero-initialized, as you mentioned, the updates will be slow at first due to the small gradients. However, the gradients are not zero because of the presence of the original W, so they can still be gradually updated. I think the authors might intend to do this?

itongggg · 2024-11-15T06:17:18Z

Oh, even though both A and B are zero-initialized, as you mentioned, the updates will be slow at first due to the small gradients. However, the gradients are not zero because of the presence of the original W, so they can still be gradually updated. I think the authors might intend to do this?

as i mentioned before the gradient of A = B^TG and B = GA^T. and G is the gradient of W so if you both initialize the A and B zero, it would never update the parameters of A and B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code bugs #18

Code bugs #18

itongggg commented Nov 10, 2024

ShuDun23 commented Nov 11, 2024

itongggg commented Nov 15, 2024

ShuDun23 commented Nov 15, 2024 •

edited

Loading

itongggg commented Nov 15, 2024

Code bugs #18

Code bugs #18

Comments

itongggg commented Nov 10, 2024

ShuDun23 commented Nov 11, 2024

itongggg commented Nov 15, 2024

ShuDun23 commented Nov 15, 2024 • edited Loading

itongggg commented Nov 15, 2024

ShuDun23 commented Nov 15, 2024 •

edited

Loading