Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiheadAttention out_projection #3

Closed
ghost opened this issue Nov 17, 2023 · 8 comments
Closed

MultiheadAttention out_projection #3

ghost opened this issue Nov 17, 2023 · 8 comments

Comments

@ghost
Copy link

ghost commented Nov 17, 2023

Hello,

thanks for this implementation - very useful.

I had a question regarding MultiheadAttention class - it seems like out_proj.weight is not updated or I am missing something?

Thanks!

@Baijiong-Lin
Copy link
Owner

Yep, I also find this question. It is because the out_proj in MultiheadAttention is NonDynamicallyQuantizableLinear rather than a simple Linear layer.

https://github.com/pytorch/pytorch/blob/dbb96ef30da4e50bdbecb56dfb9b2c43b8a39e9d/torch/nn/modules/activation.py#L1008

@ghost
Copy link
Author

ghost commented Nov 17, 2023

Yep, so o_lora_A and o_lora_B are not currently used in the package?
I think NonDynamicallyQuantizableLinear can be replaced with a linear layer with some warning being thrown if quantization is used...

@Baijiong-Lin
Copy link
Owner

Sounds like a good idea. I will try to fix it (maybe after two weeks, i am busy with some ddls currently).

@marcomistretta
Copy link

did you manage to fix it?

@marcomistretta
Copy link

P.S. thanks for this implementation!

@mounchiliu
Copy link

mounchiliu commented Nov 29, 2024

If I set enable_lora: list = ['q', 'k', 'v','o'], the problem mentioned in #7 still exists. This may be due to the need to pass the with_nn parameter during the recursive calls.
if module_name == name: return set_param(mod, rest, param, mode=mode, with_nn=with_nn)

Baijiong-Lin added a commit that referenced this issue Nov 29, 2024
@Baijiong-Lin
Copy link
Owner

@mounchiliu Thanks for your suggestion. I have fixed this problem.

@Baijiong-Lin
Copy link
Owner

@marcomistretta @ghost Sorry for the late reply. The LoRA of out_proj is not updated because of the wrong init of the LoRA rather than the use of NonDynamicallyQuantizableLinear. I have fixed this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants