[Bugfix] Fix Quant Type Descriptor for Weights#32702
[Bugfix] Fix Quant Type Descriptor for Weights#32702tjtanaa wants to merge 6 commits intovllm-project:mainfrom
Conversation
…er than token Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the bug related to the incorrect quantization type descriptor for weights. The introduction of GroupShape.PER_CHANNEL and its corresponding constant kFp8StaticChannelSym correctly aligns the weight descriptor with channel-wise quantization. The refactoring of the ScaleDesc.__str__ method also improves code readability and maintainability. The changes are well-contained and directly resolve the identified semantic issue.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
ProExpertProg
left a comment
There was a problem hiding this comment.
This looks good but maybe we should unify these into per-row?
| # Input shape is in (M, K) | ||
| # Descriptor for weights that are quantized per token | ||
| GroupShape.PER_TOKEN = GroupShape(1, -1) | ||
| GroupShape.PER_CHANNEL = GroupShape(-1, 1) |
There was a problem hiding this comment.
Maybe we need to just call this one per-row for both? Wdyt @LucasWilkinson @mgoin
Purpose
The weight descriptor assigned in this PR #27814 is not semantically correct. Introduce the
GroupShape.CHANNELandkFp8StaticChannelSymdescriptor, following PR #32414 .Test Plan
Pass CI
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.