[Bugfix][TPU] Return a Default fp8 MoE Backend#32908
[Bugfix][TPU] Return a Default fp8 MoE Backend#32908robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a regression on TPUs by changing the behavior of select_fp8_moe_backend when no suitable FP8 MoE backend is found. Instead of raising a NotImplementedError, the function now returns (Fp8MoeBackend.NONE, None). This allows the system to gracefully handle cases where no specialized backend is available, which is the expected scenario on platforms like TPUs. The change is a targeted fix that restores the previous, correct behavior.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
|
Thanks for the fix! However, this changes the behavior on CUDA + RoCM where we want to raise a clear error that no backend supports the model. How does the TPU backend use this function |
Thanks @robertgshaw2-redhat for the review. On TPU, it defines a VllmCompressedTensorsW8A8Fp8MoEMethod class that inherits vllm's CompressedTensorsW8A8Fp8MoEMethod. When TPU creates an instance of I don't see TPU uses the If the current fix is not ideal, do you have other suggestions? cc: @kyuyeunk |
@robertgshaw2-redhat, previously, we just inherited the class
before #32414, constructor of As we've discusses offline, proper way to resolve this issue is to allow plugins to register their own backends / kernels so that |
|
Thanks Kyuyeun for the input. I also tried to unblock myself (and tpu-inference) by doing something like vllm-project/tpu-inference#1512, another fix caused by the same issue. But the fix this time is not that straightforward. So I'm leaning towards getting it fixed in vllm. |
What I’m going to do is:
I’ll port up the pr tomorrow morning |
|
Hi @robertgshaw2-redhat , is there any update? This is currently blocking the TPU-inference. |
Signed-off-by: Robert Shaw <robshaw@redhat.com>
I thought a bit more about it. I think its better to have the check here rather than in the quant methods. Does the change I pushed to the branch look okay to you? |
Yes. It looks good to me. Thanks @robertgshaw2-redhat . |
great! merging. |
|
@robertgshaw2-redhat , it looks the merging is blocked |
My bad |
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Purpose
Commit caused TPU to fail at
def select_fp8_moe_backend((error). This PR intends to fix this error for TPU.Test Plan
TPU CI:
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.