-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I tried test_llama.py,but.... help...T^T #15
Comments
@MissQueen "RuntimeError: Triton requires CUDA 11.4+," this error show that you need update your cuda version。recommend to use cuda 11.8 or higher. what is your gpu name ? |
Do you mean this? |
@MissQueen Hello, I suggest that you install a clean Python environment using conda, with python==3.9, and then install cuda==11.8 and pytorch. |
I encountered the same problem. I think it caused by triton version, try install triton-nightly(2.1.0), it works for me. pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly or install from source
|
Process Process-8:
Process Process-7:
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer
logics = model_part.forward(batch_size,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward
predict_logics = self._context_forward(input_ids, infer_state)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward
input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i])
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward
self._context_flash_attention(input_embdings,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func
ans = func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention
input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward
_rms_norm_fwd_fused[(M,)](x_arg, y, weight,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _rms_norm_fwd_fused
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Triton requires CUDA 11.4+
Process Process-2:
Process Process-5:
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer
logics = model_part.forward(batch_size,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward
predict_logics = self._context_forward(input_ids, infer_state)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward
input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i])
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward
self._context_flash_attention(input_embdings,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func
ans = func(*args, **kwargs)
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention
input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps)
Process Process-1:
File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward
_rms_norm_fwd_fused[(M,)](x_arg, y, weight,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in _rms_norm_fwd_fused
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Triton requires CUDA 11.4+
Traceback (most recent call last):
File "", line 21, in _rms_norm_fwd_fused
KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
The text was updated successfully, but these errors were encountered: