[RL] [FlashInfer] Integrate FlashInfer `trtllm_fp4_block_scale_routed_moe` by zianglih · Pull Request #22209 · sgl-project/sglang

zianglih · 2026-04-06T21:41:41Z

Motivation

@HumansAnd

This PR largely mirrors existing routed MoE integration:

[FlashInfer v0.6.6][RL] Support fp8-last-n-bf16 RL for flashinfer_trtllm_routed moe backend #20214
[FlashInfer v0.6.4] [RL] Integrate FlashInfer mxfp8 gemm, MoE, and routed MoE #19537

This PR also depends on #22204 for FlashInfer trtllm moe refactoring.

Modifications

Rename and expand test_update_weights_from_disk_blackwell.py, now it covers both mxfp8 and nvfp4
Expand test_flashinfer_trtllm_gen_moe_backend.py for nvfp4 coverage
Add integration for trtllm_fp4_block_scale_routed_moe

Accuracy Tests

gsm8k

python3 -m sglang.launch_server --kv-cache-dtype bf16 --model nvidia/Qwen3-30B-A3B-NVFP4
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1209 --parallel 1209 --platinum
Accuracy: 0.945
Invalid: 0.001
Latency: 8.517 s
Output throughput: 17180.418 token/s
curl -sS http://localhost:30000/update_weights_from_disk \
  -H 'Content-Type: application/json' \
  -d '{
    "model_path": "nvidia/Qwen3-30B-A3B-NVFP4",
    "flush_cache": true,
    "abort_all_requests": false
  }'
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1209 --parallel 1209 --platinum
Accuracy: 0.942
Invalid: 0.001
Latency: 7.789 s
Output throughput: 18788.124 token/s

python3 -m pytest -s -q test/registered/backends/test_flashinfer_trtllm_gen_moe_backend.py -k NVFP4
============================================================================ warnings summary ============================================================================
../../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1428
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1428: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2 passed, 6 deselected, 3 warnings in 137.38s (0:02:17)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

python3 -m pytest -s -q test/registered/rl/test_update_weights_from_disk_blackwell.py -k NVFP4
============================================================================ warnings summary ============================================================================
../../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1428
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1428: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
1 passed, 1 deselected, 3 warnings, 3 subtests passed in 92.21s (0:01:32)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request introduces support for FP4 MoE using the FlashInfer/TRT-LLM backend, including a new routed MoE wrapper and integration with the model optimization quantization path. The implementation refactors weight handling to use standard parameter names and adds comprehensive tests for NVFP4 backends and weight updates. Feedback was provided to refactor the new FP4 MoE wrapper using a keyword argument dictionary to ensure consistency with existing wrappers in the codebase.

python/sglang/srt/layers/moe/flashinfer_trtllm_moe.py

zianglih · 2026-04-06T21:45:19Z

test/registered/backends/test_flashinfer_trtllm_gen_moe_backend.py

+        )
+        metrics = run_eval(args)
+        print(f"{metrics=}")
+        self.assertGreater(metrics["score"], 0.89)


Set to 0.89 according to #22136

nvpohanh · 2026-04-06T23:59:07Z

cc @trevor-m

trevor-m · 2026-04-07T00:01:56Z

We also have #21240

zianglih · 2026-04-07T00:04:21Z

@trevor-m do you have plan on merging the PR? I can close this one since the implementation looks identical.

zianglih · 2026-04-07T00:17:25Z

I will strip this PR to weight update and test changes and hold untill #21240 merges.

This reverts commit 7841e23.

zianglih · 2026-04-07T09:17:28Z

Closing this PR since flashinfer trtllm nvfp4 routed moe implementation is duplicated with #21240

Moving weight update refactoring and test file changes to:

[RL] Refactor NVFP4 shuffling/swizzling to in-place replacement #22204

zianglih added 4 commits April 6, 2026 12:46

Initial implementation

7fcbdd2

Rename test

e4395e4

Minor fix

f241d05

Initial implementation

8218af3

zianglih requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, Ying1123, b8zhong, ch-wan, ispobock and merrymercy as code owners April 6, 2026 21:41

github-actions bot added quant LLM Quantization blackwell SM100/SM120 labels Apr 6, 2026

zianglih mentioned this pull request Apr 6, 2026

[Roadmap] Blackwell MXFP8 and NVFP4 RL training radixark/miles#615

Open

23 tasks

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

python/sglang/srt/layers/moe/flashinfer_trtllm_moe.py Show resolved Hide resolved

zianglih commented Apr 6, 2026

View reviewed changes

zianglih changed the title ~~[RL] [FlashInfer] Integrate FlashInfer trtllm_fp4_block_scale_routed_moe~~ [RL] [FlashInfer] Fix weight update and expand tests for FlashInfer nvfp4 moe Apr 7, 2026

Drop fp4 routed moe integration

7841e23

ziang-and force-pushed the nvfp4-routed branch from 239bbac to 7841e23 Compare April 7, 2026 08:59

zianglih changed the title ~~[RL] [FlashInfer] Fix weight update and expand tests for FlashInfer nvfp4 moe~~ [RL] [FlashInfer] Refactor NVFP4 trtllm shuffling/swizzling to in-place replacement Apr 7, 2026

zianglih mentioned this pull request Apr 7, 2026

[RL] Refactor NVFP4 shuffling/swizzling to in-place replacement #22204

Open

5 tasks

Revert "Drop fp4 routed moe integration"

8196f3e

This reverts commit 7841e23.

zianglih changed the title ~~[RL] [FlashInfer] Refactor NVFP4 trtllm shuffling/swizzling to in-place replacement~~ [RL] [FlashInfer] Integrate FlashInfer trtllm_fp4_block_scale_routed_moe Apr 7, 2026

zianglih closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL] [FlashInfer] Integrate FlashInfer `trtllm_fp4_block_scale_routed_moe`#22209

[RL] [FlashInfer] Integrate FlashInfer `trtllm_fp4_block_scale_routed_moe`#22209
zianglih wants to merge 6 commits intosgl-project:mainfrom
zianglih:nvfp4-routed

zianglih commented Apr 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

zianglih Apr 6, 2026

Uh oh!

nvpohanh commented Apr 6, 2026

Uh oh!

trevor-m commented Apr 7, 2026

Uh oh!

zianglih commented Apr 7, 2026

Uh oh!

zianglih commented Apr 7, 2026 •

edited

Loading

Uh oh!

zianglih commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zianglih commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

zianglih Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

nvpohanh commented Apr 6, 2026

Uh oh!

trevor-m commented Apr 7, 2026

Uh oh!

zianglih commented Apr 7, 2026

Uh oh!

zianglih commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zianglih commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zianglih commented Apr 6, 2026 •

edited

Loading

zianglih commented Apr 7, 2026 •

edited

Loading