I tried test_llama.py，but.... help...T^T #15

MissQueen · 2023-08-03T09:30:54Z

Process Process-8:
Process Process-7:
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer
logics = model_part.forward(batch_size,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward
predict_logics = self._context_forward(input_ids, infer_state)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward
input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i])
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward
self._context_flash_attention(input_embdings,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func
ans = func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention
input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward
_rms_norm_fwd_fused[(M,)](x_arg, y, weight,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _rms_norm_fwd_fused
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Triton requires CUDA 11.4+
Process Process-2:
Process Process-5:
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer
logics = model_part.forward(batch_size,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward
predict_logics = self._context_forward(input_ids, infer_state)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward
input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i])
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward
self._context_flash_attention(input_embdings,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func
ans = func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention
input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps)
Process Process-1:
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward
_rms_norm_fwd_fused[(M,)](x_arg, y, weight,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _rms_norm_fwd_fused
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Triton requires CUDA 11.4+
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

hiworldwzj · 2023-08-03T09:41:42Z

@MissQueen "RuntimeError: Triton requires CUDA 11.4+，" this error show that you need update your cuda version。recommend to use cuda 11.8 or higher. what is your gpu name ？

MissQueen · 2023-08-04T02:03:45Z

@MissQueen "RuntimeError: Triton requires CUDA 11.4+，" this error show that you need update your cuda version。recommend to use cuda 11.8 or higher. what is your gpu name ？

Do you mean this?

'torch.version.cuda' is 11.7..
sorry..but...how to upgrade?

hiworldwzj · 2023-08-04T02:39:35Z

@MissQueen Hello, I suggest that you install a clean Python environment using conda, with python==3.9, and then install cuda==11.8 and pytorch.

pingzhuu · 2023-08-25T09:34:28Z

I encountered the same problem. I think it caused by triton version, try install triton-nightly(2.1.0), it works for me.

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

or install from source

git clone https://github.com/openai/triton.git;
cd triton/python;
pip install cmake; # build-time dependency
pip install -e .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I tried test_llama.py，but.... help...T^T #15

I tried test_llama.py，but.... help...T^T #15

MissQueen commented Aug 3, 2023

hiworldwzj commented Aug 3, 2023

MissQueen commented Aug 4, 2023

hiworldwzj commented Aug 4, 2023

pingzhuu commented Aug 25, 2023

I tried test_llama.py，but.... help...T^T #15

I tried test_llama.py，but.... help...T^T #15

Comments

MissQueen commented Aug 3, 2023

hiworldwzj commented Aug 3, 2023

MissQueen commented Aug 4, 2023

hiworldwzj commented Aug 4, 2023

pingzhuu commented Aug 25, 2023