Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype by Qiaolin-Yu · Pull Request #22006 · sgl-project/sglang

Qiaolin-Yu · 2026-04-03T05:06:13Z

Motivation

https://github.com/flashinfer-ai/flashinfer/blob/fe0539318dcc31c76a33a7ed2ab0ee3c94fe6bad/csrc/trtllm_fused_moe_kernel_launcher.cu#L1789

the dtype of router_logits should be float32 for deepseek routing method

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-03T05:06:17Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Qiaolin-Yu · 2026-04-03T07:02:23Z

/tag-and-rerun-ci

b8zhong · 2026-04-04T00:05:52Z

Btw, This bug happen before (#14350 also maybe one more instance but I can't find it...)

nvpohanh · 2026-04-06T13:21:26Z

        # during torch.compile for piecewise cuda graph.
        # Use custom op wrapper for torch.compile compatibility.
+
+        # The DeepSeekV3 routing method requires float32 router logits.


@leejnau @trevor-m is this true? If so, why didn't we run into issues before?

Maybe will be fixed by flashinfer-ai/flashinfer#2993 ?

The path for block scale had this fix already, I think we didn't use per tensor scaling before?

got it. we have never run DSV3/R1 with per-tensor FP8 before.

…gl-project#22006)

…22006)

upd

abf659e

Qiaolin-Yu requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners April 3, 2026 05:06

Qiaolin-Yu assigned Fridge003 Apr 3, 2026

github-actions bot added the run-ci label Apr 3, 2026

Merge branch 'main' into fix_per_tensor

7dd86f6

Qiaolin-Yu assigned b8zhong Apr 3, 2026

Qiaolin-Yu requested a review from b8zhong April 3, 2026 23:12

b8zhong approved these changes Apr 4, 2026

View reviewed changes

b8zhong enabled auto-merge (squash) April 4, 2026 00:16

Qiaolin-Yu disabled auto-merge April 6, 2026 04:11

Qiaolin-Yu merged commit f407461 into main Apr 6, 2026
231 of 267 checks passed

Qiaolin-Yu deleted the fix_per_tensor branch April 6, 2026 04:11

nvpohanh reviewed Apr 6, 2026

View reviewed changes

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (s…

032f962

…gl-project#22006)

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#…

7f89f42

…22006)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype#22006

Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype#22006
Qiaolin-Yu merged 2 commits intomainfrom
fix_per_tensor

Qiaolin-Yu commented Apr 3, 2026

Uh oh!

gemini-code-assist bot commented Apr 3, 2026

Uh oh!

Qiaolin-Yu commented Apr 3, 2026

Uh oh!

b8zhong commented Apr 4, 2026

Uh oh!

Uh oh!

nvpohanh Apr 6, 2026

Uh oh!

nvpohanh Apr 6, 2026

Uh oh!

trevor-m Apr 6, 2026

Uh oh!

nvpohanh Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Qiaolin-Yu commented Apr 3, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot commented Apr 3, 2026

Uh oh!

Qiaolin-Yu commented Apr 3, 2026

Uh oh!

b8zhong commented Apr 4, 2026

Uh oh!

Uh oh!

nvpohanh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

nvpohanh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

trevor-m Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

nvpohanh Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants