Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightllm和vllm性能对比 #116

Open
Cydia2018 opened this issue Sep 4, 2023 · 11 comments
Open

lightllm和vllm性能对比 #116

Cydia2018 opened this issue Sep 4, 2023 · 11 comments

Comments

@Cydia2018
Copy link

下面是我在A100-sxm-80G上的测试结果:
vllm
python -m vllm.entrypoints.api_server --model /code/llama-65b-hf --swap-space 16 --disable-log-requests --tensor-parallel-size 8
python benchmarks/benchmark_serving.py --tokenizer /code/llama-65b-hf --dataset /code/ShareGPT_V3_unfiltered_cleaned_split.json
Total time: 312.02 s
Throughput: 3.20 requests/s
Average latency: 125.45 s
Average latency per token: 0.40 s
Average latency per output token: 2.10 s


lightllm
python -m lightllm.server.api_server --model_dir /code/llama-65b-hf --tp 8 --max_total_token_num 121060 --tokenizer_mode auto
python benchmark_serving.py --tokenizer /code/llama-65b-hf --dataset /code/ShareGPT_V3_unfiltered_cleaned_split.json
total tokens: 494250
Total time: 333.10 s
Throughput: 3.00 requests/s
Average latency: 113.86 s
Average latency per token: 0.33 s
Average latency per output token: 1.54 s


看起来lightllm结果与报告的性能相差很大,可以告诉我是哪里设置错误了吗?谢谢

@hiworldwzj
Copy link
Collaborator

@Cydia2018 是不是这个没有 Fast Tokenizer 呀, 你启动服务的时候,打印的warning信息有没有提醒呀。

@Cydia2018
Copy link
Author

@hiworldwzj 是的,我没有使用Fast Tokenizer,但是vllm和lightllm都未使用,所以我认为这不是性能差距的主要原因。

@hiworldwzj
Copy link
Collaborator

@Cydia2018 这个max_total_token_num在你这个模型配置下需要重新算个合理值

@Cydia2018
Copy link
Author

@hiworldwzj 请告诉我如何计算max_total_token_num的合理区间,方便的话,请直接告诉我你们在测试65b模型的参数设置(数据集相同),谢谢

@hiworldwzj
Copy link
Collaborator

@Cydia2018
Copy link
Author

@hiworldwzj 我在A1000-smx-80G,tp=8,max_total_token_num最大开到193696,吞吐量仍然只有2.98request/s

@hiworldwzj
Copy link
Collaborator

@Cydia2018 ok,我试试你这个配置的性能,一时间确实看不出来问题到底在什么地方。

@wesissonb
Copy link

想請問各位大神~我在python setup.py install輸入進cmd後就沒反應了,請問這要怎麼解決呢?

@Lvjinhong
Copy link

@Cydia2018 ok,我试试你这个配置的性能,一时间确实看不出来问题到底在什么地方。

您好,请问就目前来说,针对llama2 70b的多卡推理,lightllm会比vllm的latency性能更高吗,有没有相关的benchmark呢,非常感谢

@zzb610
Copy link

zzb610 commented Feb 28, 2024

hi, 请问大家. 在跑 benckmark 时, 有啥好办法安装 vllm 使得 vllm 的相关依赖不影响到 lightllm 吗

@hiworldwzj
Copy link
Collaborator

@zzb610 用conda 创建两个虚拟环境呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants