Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

首个token推理速度,比Python版本还慢 #49

Open
ml2tao opened this issue Jul 11, 2023 · 1 comment
Open

首个token推理速度,比Python版本还慢 #49

ml2tao opened this issue Jul 11, 2023 · 1 comment

Comments

@ml2tao
Copy link

ml2tao commented Jul 11, 2023

你好,非常感谢作者的工作和无私奉献
通过对比我发现以下两个问题:
1.chatglm-6b的chatglm.cpp首个token的推理速度比Python版本慢了好几倍,特别是输入长度大于100。
2. 输入长度超过1000字符,chatglm.cpp的结果更差,输出的长度比Python版本短了50%以上。
机器型号:CPU型号:Intel(R) Xeon(R) Platinum 8475B,CPU核数:16,内存:60Gi

模型精度 模型推理版本 输入长度(字) 输出长度(token) 第一个token耗时 非流式输出总耗时 总耗时 剩余token平均耗时
float16 Python 32 215 0.7445s 34.1369s 35.3059s 0.1615s
float16 Python 257 306 1.3713s 50.179s 50.4559s 0.1654s
float16 Python 512 269 2.8002s 46.7511s 48.0962s 0.169s
float16 Python 1024 227 4.7105s 44.6898s 44.3965s 0.1756s
float16 Python 24 282 0.5863s 46.4415s 45.34s 0.1593s
float16 chatglm.cpp 32 217 0.5475s   21.1821s 0.0955s
float16 chatglm.cpp 257 308 4.0019s   33.6029s 0.0964s
float16 chatglm.cpp 512 271 9.6735s   35.9047s 0.0972s
float16 chatglm.cpp 1024 98 15.9248s   25.3531s 0.0972s
float16 chatglm.cpp 24 284 0.5491s   27.6705s 0.0958s
@linpan
Copy link

linpan commented Aug 24, 2023

可以提供测试数据,和脚本?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants