[BugFix] Fix TRT-LLM NVFP4 DP/EP#32349
[BugFix] Fix TRT-LLM NVFP4 DP/EP#32349robertgshaw2-redhat merged 7 commits intovllm-project:mainfrom
Conversation
|
@pavanimajety @mgoin can you help review? Thanks! |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a crash that occurs when using FlashInfer with NVFP4 quantization for MoE layers, specifically with the TRT-LLM backend. The issue stems from an assertion failure where moe_quant_config was unexpectedly None. The proposed change correctly addresses the crash for the TRT-LLM backend by sourcing the a1_gscale from the layer object directly. However, my review indicates that this fix, while effective for the intended case, may introduce an AttributeError for other backends like FlashInfer-CUTLASS, where layer.a1_gscale is not set. I've provided a critical comment with a suggested code change to ensure the fix is robust across all supported backends.
| hidden_states_fp4, hidden_states_sf = flashinfer.fp4_quantize( | ||
| hidden_states, | ||
| a1_gscale, | ||
| layer.a1_gscale, |
There was a problem hiding this comment.
While this change correctly fixes the issue for the FLASHINFER_TRTLLM backend, it could introduce a bug for other backends like FLASHINFER_CUTLASS. For non-TRTLLM backends, self.moe_quant_config is created, but layer.a1_gscale is not set, which would lead to an AttributeError here.
A more robust approach would be to conditionally access a1_gscale from either self.moe_quant_config or layer.
| layer.a1_gscale, | |
| self.moe_quant_config.a1_gscale if self.moe_quant_config else layer.a1_gscale, |
|
thanks for the fix! sorry I did not catch this one in my testing. |
|
Can you post your deployment? |
|
add your launch command. I want to verify the result. |
|
Could you please add this case to the B200 Temporary MoE Refactor job? |
|
The bot is correct. This |
Added in the PR test plan |
May you elaborate more on this? Is it a github issue I shall add to ? |
|
94a8a39 to
fd2f9d1
Compare
|
@robertgshaw2-redhat updated PR per comment, may you help review? Thanks~ |
|
I made some improvements to the PR |
|
Thanks for the fix! I triggered the tests. PLease ping me on slack if you need any more help getting this landed. |
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
Hi @robertgshaw2-redhat , the only failing test seems to be due to container startup errors, can you rerun it and merge if it passes? Thanks! |
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>
Purpose
Flashinfer trtllm nvfp4 was broke by #31692 .
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.