You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO: Started server process [560]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (
2, curl http://127.0.0.1:8080/generate -X POST -d '{"inputs":"What is AI?","parameters":{"max_new_tokens":17, "frequency_penalty":1}}' -H 'Content-Type: application/json'
3, the server error and pending, the client also pending .
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
The text was updated successfully, but these errors were encountered:
The code has been tested on a range of GPUs including A100, A800, 4090, and H800. If you are running the code on A100, A800, etc., we recommend using triton==2.0.0.dev20221202 or triton==2.1.0. If you are running the code on H800, etc., it is necessary to compile and install the source code of [triton==2.1.0](https://github.com/openai/triton/tree/main) from the GitHub repository. If the code doesn't work on other GPUs, try modifying the triton kernel used in model inference.
Install Triton Package
use triton==2.0.0.dev20221202
pip install triton==2.0.0.dev20221202
use triton==2.1.0 (Better performance, but the code is under continuous development and may be unstable.)
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
1, python -m lightllm.server.api_server --model_dir baichuan-13b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 4096 --trust_remote_code
success and can see log :
INFO: Started server process [560]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (
2, curl http://127.0.0.1:8080/generate -X POST -d '{"inputs":"What is AI?","parameters":{"max_new_tokens":17, "frequency_penalty":1}}' -H 'Content-Type: application/json'
3, the server error and pending, the client also pending .
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
The text was updated successfully, but these errors were encountered: