[RL] [FlashInfer] Integrate FlashInfer trtllm_fp4_block_scale_routed_moe#22209
[RL] [FlashInfer] Integrate FlashInfer trtllm_fp4_block_scale_routed_moe#22209zianglih wants to merge 6 commits intosgl-project:mainfrom
trtllm_fp4_block_scale_routed_moe#22209Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for FP4 MoE using the FlashInfer/TRT-LLM backend, including a new routed MoE wrapper and integration with the model optimization quantization path. The implementation refactors weight handling to use standard parameter names and adds comprehensive tests for NVFP4 backends and weight updates. Feedback was provided to refactor the new FP4 MoE wrapper using a keyword argument dictionary to ensure consistency with existing wrappers in the codebase.
| ) | ||
| metrics = run_eval(args) | ||
| print(f"{metrics=}") | ||
| self.assertGreater(metrics["score"], 0.89) |
|
cc @trevor-m |
|
We also have #21240 |
|
@trevor-m do you have plan on merging the PR? I can close this one since the implementation looks identical. |
|
I will strip this PR to weight update and test changes and hold untill #21240 merges. |
trtllm_fp4_block_scale_routed_moeThis reverts commit 7841e23.
trtllm_fp4_block_scale_routed_moe
|
Closing this PR since flashinfer trtllm nvfp4 routed moe implementation is duplicated with #21240 Moving weight update refactoring and test file changes to: |
Motivation
@HumansAnd
This PR largely mirrors existing routed MoE integration:
flashinfer_trtllm_routedmoe backend #20214This PR also depends on #22204 for FlashInfer trtllm moe refactoring.
Modifications
test_update_weights_from_disk_blackwell.py, now it covers both mxfp8 and nvfp4test_flashinfer_trtllm_gen_moe_backend.pyfor nvfp4 coveragetrtllm_fp4_block_scale_routed_moeAccuracy Tests
gsm8k
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci