Skip to content

[Bugfix] Fix Quant Type Descriptor for Weights#32702

Open
tjtanaa wants to merge 6 commits intovllm-project:mainfrom
EmbeddedLLM:fix-weight-descriptor
Open

[Bugfix] Fix Quant Type Descriptor for Weights#32702
tjtanaa wants to merge 6 commits intovllm-project:mainfrom
EmbeddedLLM:fix-weight-descriptor

Conversation

@tjtanaa
Copy link
Collaborator

@tjtanaa tjtanaa commented Jan 20, 2026

Purpose

The weight descriptor assigned in this PR #27814 is not semantically correct. Introduce the GroupShape.CHANNEL and kFp8StaticChannelSym descriptor, following PR #32414 .

Test Plan

Pass CI

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…er than token

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa added rocm Related to AMD ROCm ready ONLY add when PR is ready to merge/full CI is needed labels Jan 20, 2026
@mergify mergify bot added the bug Something isn't working label Jan 20, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the bug related to the incorrect quantization type descriptor for weights. The introduction of GroupShape.PER_CHANNEL and its corresponding constant kFp8StaticChannelSym correctly aligns the weight descriptor with channel-wise quantization. The refactoring of the ScaleDesc.__str__ method also improves code readability and maintainability. The changes are well-contained and directly resolve the identified semantic issue.

@mergify
Copy link

mergify bot commented Jan 21, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 21, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@mergify mergify bot removed the needs-rebase label Jan 21, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but maybe we should unify these into per-row?

# Input shape is in (M, K)
# Descriptor for weights that are quantized per token
GroupShape.PER_TOKEN = GroupShape(1, -1)
GroupShape.PER_CHANNEL = GroupShape(-1, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to just call this one per-row for both? Wdyt @LucasWilkinson @mgoin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants