You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes foryou. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explainedin https://github.com/huggingface/transformers/pull/24565
{'torch_dtype': torch.float16, 'revision': 'main'}
YuanForCausalLM(
(model): YuanModel(
(embed_tokens): Embedding(135040, 2048, padding_idx=77185)
(layers): ModuleList(
(0-23): 24 x YuanDecoderLayer(
(self_attn): YuanAttention(
(v_proj): Linear(in_features=2048, out_features=2048, bias=False)
(o_proj): Linear(in_features=2048, out_features=2048, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
(lf_gate): LocalizedFiltering(
(conv1): Conv2d(2048, 1024, kernel_size=(2, 1), stride=(1, 1), padding=(1, 0))
(conv2): Conv2d(1024, 2048, kernel_size=(2, 1), stride=(1, 1), padding=(1, 0))
(output_layernorm): LlamaRMSNorm()
)
(q_proj): Linear(in_features=2048, out_features=2048, bias=False)
(k_proj): Linear(in_features=2048, out_features=2048, bias=False)
)
(mlp): YuanMLP(
(up_proj): Linear(in_features=2048, out_features=8192, bias=False)
(gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
(down_proj): Linear(in_features=8192, out_features=2048, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=2048, out_features=135040, bias=False)
)
user: yuan2.0是谁开发的?
assistant: Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/github/FastChat/fastchat/serve/cli.py", line 304, in<module>
main(args)
File "/github/FastChat/fastchat/serve/cli.py", line 227, in main
chat_loop(
File "/github/FastChat/fastchat/serve/inference.py", line 532, in chat_loop
outputs = chatio.stream_output(output_stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/github/FastChat/fastchat/serve/cli.py", line 63, in stream_output
foroutputsin output_stream:
File "/opt/conda/envs/fc/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "/github/FastChat/fastchat/serve/inference.py", line 160, in generate_stream
out = model(
^^^^^^
File "/opt/conda/envs/fc/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 938, in forward
outputs = self.model(
^^^^^^^^^^^
File "/opt/conda/envs/fc/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 768, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/opt/conda/envs/fc/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 426, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/conda/envs/fc/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 358, in forward
raise ValueError(
ValueError: Attention mask should be of size (1, 1, 1, 10), but is torch.Size([1, 1, 1, 1]
是否是和yuan_hf_model.py脚本里相关模块的处理有关?
我上述使用推理脚本还是比较常见的,所以如果可以的话,是否可以修复这个问题?
The text was updated successfully, but these errors were encountered:
cauwulixuan
changed the title
[**BUG**]Huggingface版本推理流式输出报错
[*BUG*]Huggingface版本推理流式输出报错
Jan 18, 2024
cauwulixuan
changed the title
[*BUG*]Huggingface版本推理流式输出报错
[BUG]Huggingface版本推理流式输出报错
Jan 18, 2024
if self.training or self.reset_position_ids and attention_mask is not None:
attention_mask, _ = self._prepare_decoder_attention_mask_training(input_ids1, inputs_embeds, self.eod_token, reset_mask_flag, self.reset_attention_mask, self.reset_position_ids)
我在用以下代码进行流式推理的时候,参考fastchat-inference.py 的这一部分stream_generate
use_flash_attention=True
的时候,是可以正常推理的;use_flash_attention=False
的时候,报错了,报错信息如下:是否是和
yuan_hf_model.py
脚本里相关模块的处理有关?我上述使用推理脚本还是比较常见的,所以如果可以的话,是否可以修复这个问题?
The text was updated successfully, but these errors were encountered: