Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash scripts/forward.sh #7

Open
Veluriyam opened this issue Sep 23, 2024 · 4 comments
Open

bash scripts/forward.sh #7

Veluriyam opened this issue Sep 23, 2024 · 4 comments

Comments

@Veluriyam
Copy link

bash scripts/forward.sh in Llama-2-7b-chat-hf
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.10s/it]
[2024-09-23 17:22:48,125] [forward.py:111] Model name: Llama-2-7b-chat-hf
[2024-09-23 17:22:48,132] [forward.py:112] Model size: 13.543948288
[2024-09-23 17:22:48,133] [utils.py:94] GPU 0: 6.88 GB / 32.00 GB
[2024-09-23 17:22:48,133] [utils.py:94] GPU 1: 6.88 GB / 32.00 GB
[2024-09-23 17:22:48,273] [forward.py:173] Running
0%| | 0/100 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1,0,0], thread: [97,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
...
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
0%| | 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
File "forward.py", line 212, in
main()
File "forward.py", line 177, in main
hidden_states = forward(model, toker, messages)
File "forward.py", line 52, in forward
outputs = model(
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1070, in forward
layer_outputs = decoder_layer(
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 798, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 706, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
File "/opt/conda/envs/onprompt/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 232, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What is the cause of this error?Thanks.

@chujiezheng
Copy link
Owner

Pls try to downgrade the transformers's version to ~4.37.

@Veluriyam
Copy link
Author

Based on your previous issues reconfigured the environment python==3.10, torch==2.1.1, transformers==4.37, used cuda==11.6, and got a similar error as yesterday, but in a different place:
Traceback (most recent call last):
File "/root/yp/LLM-Safeguard-main/code/forward.py", line 212, in
main()
File "/root/yp/LLM-Safeguard-main/code/forward.py", line 177, in main
hidden_states = forward(model, toker, messages)
File "/root/yp/LLM-Safeguard-main/code/forward.py", line 77, in forward
hidden_states = [e[0].detach().half().cpu() for e in outputs.hidden_states[1:]]
File "/root/yp/LLM-Safeguard-main/code/forward.py", line 77, in
hidden_states = [e[0].detach().half().cpu() for e in outputs.hidden_states[1:]]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@Veluriyam
Copy link
Author

事实上我检查每个hidden_state 的形状:outputs = model(
input_ids,
attention_mask=input_ids.new_ones(input_ids.size(), dtype=model.dtype),
return_dict=True,
output_hidden_states=True,
)
# 检查每个 hidden_state 的形状
for i, hidden_state in enumerate(outputs.hidden_states):
print(f"Layer {i} shape: {hidden_state.shape}")
print(f"Layer{i}:{hidden_state}")发现在执行到此时,打印输出第一个Layer0~Layer32 shape:torch.Size([1, 22, 4096]),但是从Layer 19开始-Layer32,tensor变为0,如图,然后打印下一个Layer0时shape变成了[1,21,4096],但tensor均不为0,随后就出现了错误,是shape变化的原因导致的错误吗?

image
image

@Veluriyam
Copy link
Author

Thanks for you,change the device_map of function "model" from auto into sequential had solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants