Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] W8A8 support for turbomind engine #2962

Open
binhtranmcs opened this issue Dec 27, 2024 · 1 comment
Open

[Feature] W8A8 support for turbomind engine #2962

binhtranmcs opened this issue Dec 27, 2024 · 1 comment

Comments

@binhtranmcs
Copy link

Motivation

For now, W8A8 quantization is only supported with pytorch engine. But the turbomind engine is much more performant. Do you have any plan to support W8A8 for turbomind engine.

Also, only SmoothQuant is supported as in here. I think FP8 quantization is also a feature to consider.

Thanks!

Related resources

No response

Additional context

No response

@lvhan028
Copy link
Collaborator

FP8 is being supported in pytorch engine.
Due to shorthand, we haven't started w8a8 inference including fp8 in turbomind engine.
We probably start this job after Chinese New Year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants