Skip to content

[Bugfix][TPU] Return a Default fp8 MoE Backend#32908

Merged
robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
vanbasten23:xiowei/fix_select_fp8_moe_backend
Jan 26, 2026
Merged

[Bugfix][TPU] Return a Default fp8 MoE Backend#32908
robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
vanbasten23:xiowei/fix_select_fp8_moe_backend

Conversation

@vanbasten23
Copy link
Collaborator

@vanbasten23 vanbasten23 commented Jan 23, 2026

Purpose

Commit caused TPU to fail at def select_fp8_moe_backend( (error). This PR intends to fix this error for TPU.

Test Plan

TPU CI:

USE_MOE_EP_KERNEL=1 MODEL_IMPL_TYPE=vllm vllm serve --seed=42 --model=BCCard/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic --max-model-len=10240 --max-num-batched-tokens=8192 --max-num-seqs=512 --no-enable-prefix-caching --disable-log-requests --tensor-parallel-size=8 --kv-cache-dtype=fp8 --gpu-memory-utilization=0.95 --async-scheduling --enable-expert-parallel

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a regression on TPUs by changing the behavior of select_fp8_moe_backend when no suitable FP8 MoE backend is found. Instead of raising a NotImplementedError, the function now returns (Fp8MoeBackend.NONE, None). This allows the system to gracefully handle cases where no specialized backend is available, which is the expected scenario on platforms like TPUs. The change is a targeted fix that restores the previous, correct behavior.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

@robertgshaw2-redhat
Copy link
Collaborator

Thanks for the fix!

However, this changes the behavior on CUDA + RoCM where we want to raise a clear error that no backend supports the model.

How does the TPU backend use this function

@robertgshaw2-redhat robertgshaw2-redhat changed the title Return a default fp8 moe backend as before. [Bugfix][TPU] Return a Default fp8 MoE Backend Jan 23, 2026
@mergify mergify bot added the bug Something isn't working label Jan 23, 2026
@vanbasten23
Copy link
Collaborator Author

vanbasten23 commented Jan 23, 2026

How does the TPU backend use this function

Thanks @robertgshaw2-redhat for the review. On TPU, it defines a VllmCompressedTensorsW8A8Fp8MoEMethod class that inherits vllm's CompressedTensorsW8A8Fp8MoEMethod. When TPU creates an instance of VllmCompressedTensorsW8A8Fp8MoEMethod, it invokes CompressedTensorsW8A8Fp8MoEMethod.__init__ here. Then during CompressedTensorsW8A8Fp8MoEMethod.__init__, it fails at the select_fp8_moe_backend.

I don't see TPU uses the self.fp8_backend, self.experts_cls returned from select_fp8_moe_backend. It seems that as long as select_fp8_moe_backend doesn't raise any exception, TPU should work.

If the current fix is not ideal, do you have other suggestions?

cc: @kyuyeunk

@kyuyeunk
Copy link
Contributor

How does the TPU backend use this function

@robertgshaw2-redhat, previously, we just inherited the class Fp8MoEMethod but overload following functions

  • process_weights_after_loading: so we can apply tpu specific weight transformation during weight loading time
  • apply: so we can invoke our own forward function

before #32414, constructor of Fp8MoEMethod would work fine even if no backend was found, but it is erroring out. For now, we are getting around the issue by overriding constructor as well, but ideally, we want to reuse vLLM components as much as possible which will also allow us to align with vLLM's overall direction on how things are designed. Alternative is overriding functions that breaks plugin to the point it's not a vLLM anymore.

As we've discusses offline, proper way to resolve this issue is to allow plugins to register their own backends / kernels so that Fp8MoEMethod can correctly select that backend / kernel. Is there an eta on when that feature would land?

@vanbasten23
Copy link
Collaborator Author

Thanks Kyuyeun for the input. I also tried to unblock myself (and tpu-inference) by doing something like vllm-project/tpu-inference#1512, another fix caused by the same issue. But the fix this time is not that straightforward. So I'm leaning towards getting it fixed in vllm.

@robertgshaw2-redhat
Copy link
Collaborator

Thanks Kyuyeun for the input. I also tried to unblock myself (and tpu-inference) by doing something like vllm-project/tpu-inference#1512, another fix caused by the same issue. But the fix this time is not that straightforward. So I'm leaning towards getting it fixed in vllm.

What I’m going to do is:

  • validate in ModelOpt, Fp8, and CT that the backend returned is not NOne (so move the validation from this function to the quant methods). this will unblock tpu
  • next week I’ll work on the register api we discussed
  • once tpu migrates to register api, we can move validation back to this function

I’ll port up the pr tomorrow morning

@vanbasten23
Copy link
Collaborator Author

Hi @robertgshaw2-redhat , is there any update? This is currently blocking the TPU-inference.

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
@robertgshaw2-redhat
Copy link
Collaborator

Hi @robertgshaw2-redhat , is there any update? This is currently blocking the TPU-inference.

I thought a bit more about it. I think its better to have the check here rather than in the quant methods.

Does the change I pushed to the branch look okay to you?

@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 26, 2026
@vanbasten23
Copy link
Collaborator Author

Hi @robertgshaw2-redhat , is there any update? This is currently blocking the TPU-inference.

I thought a bit more about it. I think its better to have the check here rather than in the quant methods.

Does the change I pushed to the branch look okay to you?

Yes. It looks good to me. Thanks @robertgshaw2-redhat .

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 26, 2026 21:05
@robertgshaw2-redhat
Copy link
Collaborator

Hi @robertgshaw2-redhat , is there any update? This is currently blocking the TPU-inference.

I thought a bit more about it. I think its better to have the check here rather than in the quant methods.
Does the change I pushed to the branch look okay to you?

Yes. It looks good to me. Thanks @robertgshaw2-redhat .

great! merging.

@vanbasten23
Copy link
Collaborator Author

@robertgshaw2-redhat , it looks the merging is blocked At least 1 approving review is required by reviewers with write access.

@robertgshaw2-redhat robertgshaw2-redhat merged commit 510ed1e into vllm-project:main Jan 26, 2026
54 checks passed
@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat , it looks the merging is blocked At least 1 approving review is required by reviewers with write access.

My bad

apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants