Skip to content

Conversation

@Sunny-bot1
Copy link
Collaborator

@Sunny-bot1 Sunny-bot1 commented Sep 4, 2025

支持Machete WINT8,对PR:#3561 的补充

性能测试

模型

Qwen3-30B-A3B WINT8 TPS提升7% 🚀

backend 解码速度 TPS
paddle 35.86 6886.35
machete 38.85 7362.53

kernel

k, n = 7168, 1536

m paddle machete
32 23.999 19.654
64 24.262 23.046
128 24.601 31.570
256 32.015 26.281
512 51.989 28.038
1024 73.391 48.761
2048 137.550 76.463
4096 208.295 143.307

k, n = 2048, 5120

m paddle machete
32 9.218 11.532
64 11.439 13.187
128 17.359 14.134
256 24.262 17.138
512 39.237 24.786
1024 65.093 39.154
2048 106.475 68.299
4096 205.088 118.632

@paddle-bot
Copy link

paddle-bot bot commented Sep 4, 2025

Thanks for your contribution!

@zhoutianzi666 zhoutianzi666 merged commit b1a5b75 into PaddlePaddle:develop Sep 15, 2025
34 of 39 checks passed
Sunny-bot1 added a commit to Sunny-bot1/FastDeploy that referenced this pull request Sep 18, 2025
Jiang-Jia-Jun pushed a commit that referenced this pull request Sep 19, 2025
* support v1 loader for machete (#3999)

* [Optimize] Support WINT8 and group scale for Machete (#3905)

* [Optimize] Machete using group scale default (#4121)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants