MultiheadAttention out_projection #3

ghost · 2023-11-17T15:31:18Z

Hello,

thanks for this implementation - very useful.

I had a question regarding MultiheadAttention class - it seems like out_proj.weight is not updated or I am missing something?

Thanks!

The text was updated successfully, but these errors were encountered:

Baijiong-Lin · 2023-11-17T15:37:43Z

Yep, I also find this question. It is because the out_proj in MultiheadAttention is NonDynamicallyQuantizableLinear rather than a simple Linear layer.

https://github.com/pytorch/pytorch/blob/dbb96ef30da4e50bdbecb56dfb9b2c43b8a39e9d/torch/nn/modules/activation.py#L1008

ghost · 2023-11-17T18:24:35Z

Yep, so o_lora_A and o_lora_B are not currently used in the package?
I think NonDynamicallyQuantizableLinear can be replaced with a linear layer with some warning being thrown if quantization is used...

Baijiong-Lin · 2023-11-18T03:48:21Z

Sounds like a good idea. I will try to fix it (maybe after two weeks, i am busy with some ddls currently).

marcomistretta · 2024-05-15T00:16:27Z

did you manage to fix it?

marcomistretta · 2024-05-15T00:17:19Z

P.S. thanks for this implementation!

mounchiliu · 2024-11-29T08:51:45Z

If I set enable_lora: list = ['q', 'k', 'v','o'], the problem mentioned in #7 still exists. This may be due to the need to pass the with_nn parameter during the recursive calls.
if module_name == name: return set_param(mod, rest, param, mode=mode, with_nn=with_nn)

Baijiong-Lin · 2024-11-29T11:29:33Z

@mounchiliu Thanks for your suggestion. I have fixed this problem.

Baijiong-Lin · 2024-11-29T11:32:07Z

@marcomistretta @ghost Sorry for the late reply. The LoRA of out_proj is not updated because of the wrong init of the LoRA rather than the use of NonDynamicallyQuantizableLinear. I have fixed this problem.

Baijiong-Lin added a commit that referenced this issue Nov 29, 2024

fix a bug (#3)

f7ff179

Baijiong-Lin closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiheadAttention out_projection #3

MultiheadAttention out_projection #3

ghost commented Nov 17, 2023

Baijiong-Lin commented Nov 17, 2023

ghost commented Nov 17, 2023

Baijiong-Lin commented Nov 18, 2023

marcomistretta commented May 15, 2024

marcomistretta commented May 15, 2024

mounchiliu commented Nov 29, 2024 •

edited

Loading

Baijiong-Lin commented Nov 29, 2024

Baijiong-Lin commented Nov 29, 2024

MultiheadAttention out_projection #3

MultiheadAttention out_projection #3

Comments

ghost commented Nov 17, 2023

Baijiong-Lin commented Nov 17, 2023

ghost commented Nov 17, 2023

Baijiong-Lin commented Nov 18, 2023

marcomistretta commented May 15, 2024

marcomistretta commented May 15, 2024

mounchiliu commented Nov 29, 2024 • edited Loading

Baijiong-Lin commented Nov 29, 2024

Baijiong-Lin commented Nov 29, 2024

mounchiliu commented Nov 29, 2024 •

edited

Loading