You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For now, W8A8 quantization is only supported with pytorch engine. But the turbomind engine is much more performant. Do you have any plan to support W8A8 for turbomind engine.
Also, only SmoothQuant is supported as in here. I think FP8 quantization is also a feature to consider.
Thanks!
Related resources
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
FP8 is being supported in pytorch engine.
Due to shorthand, we haven't started w8a8 inference including fp8 in turbomind engine.
We probably start this job after Chinese New Year.
Motivation
For now, W8A8 quantization is only supported with pytorch engine. But the turbomind engine is much more performant. Do you have any plan to support W8A8 for turbomind engine.
Also, only SmoothQuant is supported as in here. I think FP8 quantization is also a feature to consider.
Thanks!
Related resources
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: