-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mixtral moe AWQ quantization. #2725
Conversation
@zhulinJulia24 please add mistralai/Mixtral-8x7B-Instruct-v0.1 awq quantization into test cases |
lmdeploy/lite/quantization/awq.py
Outdated
@@ -244,6 +254,9 @@ def quant_weights(model, | |||
if skip_if_contains and skip_if_contains in child_name: | |||
q_linear = fc | |||
pack_or_skip = 'skipped' | |||
elif 'block_sparse_moe.gate' in name: # moe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an additional skip patch, considering we already possess the skip_if _contains
functionality. There is a growing concern that as more Mixtures of Experts (MoEs) are integrated, the skipping branches may become increasingly difficult to maintain.
使用模型 Plap-8x13B 进行量化的过程中 请问可能是什么原因?是我们这个pr的bug还是awq量化算法的问题,麻烦提供点思路。 |
测试 mistralai/Mixtral-8x7B-Instruct-v0.1 模型是OK的。8x13B 的 hidden_size 是 5120,8x7B 的hidden_size 是 4096. |
模型访问不了。可以断点进 observer.py 108 行,应该是模型推理过程某个层前后有个 tensor shape 不对 |
Layer:model.layers.0.self_attn.q_proj, group:inputs, shape:torch.Size([1, 2048, 5120]), weight:torch.Size([5120, 5120])
Layer:model.layers.0.self_attn.k_proj, group:inputs, shape:torch.Size([1, 2048, 5120]), weight:torch.Size([5120, 5120])
Layer:model.layers.0.self_attn.v_proj, group:inputs, shape:torch.Size([1, 2048, 5120]), weight:torch.Size([5120, 5120])
Layer:model.layers.0.self_attn.o_proj, group:inputs, shape:torch.Size([1, 2048, 5120]), weight:torch.Size([5120, 5120])
Layer:model.layers.0.block_sparse_moe.gate, group:inputs, shape:torch.Size([2048, 5120]), weight:torch.Size([8, 5120])
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(605)forward()
-> current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
(Pdb) ll
604 def forward(self, hidden_states):
605 B-> current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
606 current_hidden_states = self.w2(current_hidden_states)
607 return current_hidden_states
(Pdb) p hidden_states.shape
torch.Size([12, 5120])
(Pdb) n
Layer:model.layers.0.block_sparse_moe.experts.0.w1, group:inputs, shape:torch.Size([12, 5120]), weight:torch.Size([13824, 5120])
Layer:model.layers.0.block_sparse_moe.experts.0.w3, group:inputs, shape:torch.Size([12, 5120]), weight:torch.Size([13824, 5120])
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(606)forward()
-> current_hidden_states = self.w2(current_hidden_states)
(Pdb) p current_hidden_states.shape
torch.Size([12, 13824])
(Pdb) p self.w2.weight.shape
torch.Size([5120, 13824])
(Pdb) n
Layer:model.layers.0.block_sparse_moe.experts.0.w2, group:inputs, shape:torch.Size([12, 13824]), weight:torch.Size([5120, 13824])
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(607)forward()
-> return current_hidden_states
(Pdb) n
--Return--
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(607)forward()->tensor([[ 0.0...torch.float16)
-> return current_hidden_states
(Pdb) n
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(673)forward()
-> final_hidden_states.index_add_(0, top_x, current_hidden_states.to(hidden_states.dtype))
(Pdb) c
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(605)forward()
-> current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
(Pdb) p hidden_states.shape
torch.Size([1, 5120])
(Pdb) n
Layer:model.layers.0.block_sparse_moe.experts.1.w1, group:inputs, shape:torch.Size([1, 5120]), weight:torch.Size([13824, 5120])
Layer:model.layers.0.block_sparse_moe.experts.1.w3, group:inputs, shape:torch.Size([1, 5120]), weight:torch.Size([13824, 5120])
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(606)forward()
-> current_hidden_states = self.w2(current_hidden_states)
(Pdb) p current_hidden_states.shape
torch.Size([1, 13824])
(Pdb) n
Layer:model.layers.0.block_sparse_moe.experts.1.w2, group:inputs, shape:torch.Size([1, 13824]), weight:torch.Size([5120, 13824])
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(607)forward()
-> return current_hidden_states
(Pdb) c
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(605)forward()
-> current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
(Pdb) n
Layer:model.layers.0.block_sparse_moe.experts.2.w1, group:inputs, shape:torch.Size([0, 5120]), weight:torch.Size([13824, 5120])
IndexError: max(): Expected reduction dim 0 to have non-zero size.
> /opt/py3/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py(605)forward()
-> current_hidden_states = self.act_fn(self.w1(hidden_states)) * self.w3(hidden_states)
(Pdb) p hidden_states.shape
torch.Size([0, 5120]) 这个可能是模型的原因,中间一个 moe层 的 expert_mask 为全 0 会导致这个问题。 |
Error when awq quantize mistralai/Mixtral-8x7B-Instruct-v0.1 model @AllentDan
|
@anaivebird Remove |
Thanks, it works. But remove |
It was a bug |
@AllentDan |
No description provided.