Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]baichuan-13b-error #156

Open
kuangdao opened this issue Sep 28, 2023 · 2 comments
Open

[BUG]baichuan-13b-error #156

kuangdao opened this issue Sep 28, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@kuangdao
Copy link

1, python -m lightllm.server.api_server --model_dir baichuan-13b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 4096 --trust_remote_code

success and can see log :

INFO: Started server process [560]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (

2, curl http://127.0.0.1:8080/generate -X POST -d '{"inputs":"What is AI?","parameters":{"max_new_tokens":17, "frequency_penalty":1}}' -H 'Content-Type: application/json'

3, the server error and pending, the client also pending .
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

@kuangdao kuangdao added the bug Something isn't working label Sep 28, 2023
@chatllm
Copy link

chatllm commented Sep 30, 2023

The same problem occurs when using the chatglm2-6b model

@hiworldwzj
Copy link
Collaborator

@kuangdao @chatllm

The code has been tested on a range of GPUs including A100, A800, 4090, and H800. If you are running the code on A100, A800, etc., we recommend using triton==2.0.0.dev20221202 or triton==2.1.0. If you are running the code on H800, etc., it is necessary to compile and install the source code of [triton==2.1.0](https://github.com/openai/triton/tree/main) from the GitHub repository. If the code doesn't work on other GPUs, try modifying the triton kernel used in model inference.

Install Triton Package
use triton==2.0.0.dev20221202

pip install triton==2.0.0.dev20221202
use triton==2.1.0 (Better performance, but the code is under continuous development and may be unstable.)

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants