[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses#22537
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring of the Mixture of Experts (MoE) quantization configuration by introducing a new FusedMoEQuantConfig structure. This is a positive change towards a more structured and extensible configuration. However, the refactoring appears to be incomplete, as there are several critical issues, including assert False statements, NotImplementedErrors, and usage of undefined variables in the new code paths. These issues will cause runtime failures and need to be addressed before this PR can be considered for merging. My review focuses on these critical issues.
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
This pull request has merge conflicts that must be resolved before it can be |
27a4513 to
688374b
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
a6b4b30 to
d5b12e8
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
328fc4c to
ad0e7ff
Compare
d1f132f to
417e037
Compare
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
5f60537 to
8d94b93
Compare
|
is this tested with mxfp4? @bnellnm The test result section is still TBD. |
@minosfuture I think there was one issue with mxfp4 which has been fixed. I've not tested every possible combination but afaik everything should work. |
Purpose
FusedMoEQuantConfigobjects to the subclass ofFusedMoEMethodBasethat will use that info.FusedMoEQuantConfigand make it more uniform.fused_expertswith aFusedMoEQuantConfig. This eliminates the varioususe_bool flags and quantization parameters_scales,_zp,_bias,_gscale, etc.Test Plan
Test Result
(Optional) Documentation Update
cc @varun-sundar-rabindranath , @LucasWilkinson , @jeejeelee , @wenscarl , @nvpohanh , @mgoin