[Feature] W8A8 support for turbomind engine #2962

binhtranmcs · 2024-12-27T03:24:24Z

Motivation

For now, W8A8 quantization is only supported with pytorch engine. But the turbomind engine is much more performant. Do you have any plan to support W8A8 for turbomind engine.

Also, only SmoothQuant is supported as in here. I think FP8 quantization is also a feature to consider.

Thanks!

Related resources

No response

Additional context

No response

lvhan028 · 2024-12-27T11:46:49Z

FP8 is being supported in pytorch engine.
Due to shorthand, we haven't started w8a8 inference including fp8 in turbomind engine.
We probably start this job after Chinese New Year.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] W8A8 support for turbomind engine #2962

[Feature] W8A8 support for turbomind engine #2962

binhtranmcs commented Dec 27, 2024

lvhan028 commented Dec 27, 2024

[Feature] W8A8 support for turbomind engine #2962

[Feature] W8A8 support for turbomind engine #2962

Comments

binhtranmcs commented Dec 27, 2024

Motivation

Related resources

Additional context

lvhan028 commented Dec 27, 2024