[Bug] Fix V4-Pro NaN on Blackwell by converting fp8_einsum input scale to ue8m0 by yhyang201 · Pull Request #25733 · sgl-project/sglang

yhyang201 · 2026-05-19T02:28:46Z

Summary

Fix DeepSeek-V4-Pro producing garbage text during autoregressive decode on Blackwell GPUs (B300/B200)
Root cause: deep_gemm's transpose_and_pack_fp32_into_ue8m0 CUDA kernel has a packing bug — it doesn't mask mantissa bits when extracting fp32 exponent, so non-power-of-2 scale values get corrupted. This causes fp8_einsum to produce NaN at batch=1 (single-token decode)
Fix: add ceil_to_ue8m0() on the activation scale before passing to fp8_einsum, ensuring mantissa bits are zero before packing
The scale tensor is tiny (e.g. shape (2, 32)), so the overhead is negligible
Detailed analysis: https://gist.github.com/yhyang201/2fce5750e44970af419f9669e10e994c

CI States

Latest PR Test (Base): 🚫 Run #26072487958
Latest PR Test (Extra): ⚠️ Not enabled -- add run-ci-extra label to opt in.

…e to ue8m0 deep_gemm's ue8m0 packing kernel has a bug where non-power-of-2 fp32 scale values get their mantissa bits leaked into packed exponent fields. This causes fp8_einsum to produce NaN during autoregressive decode (batch=1) for DeepSeek-V4-Pro on Blackwell GPUs. Adding ceil_to_ue8m0() on the activation scale ensures mantissa bits are zero before packing, which matches deep_gemm's expected input format and fixes the NaN.

gemini-code-assist · 2026-05-19T02:28:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yhyang201 · 2026-05-19T02:36:33Z

/tag-and-rerun-ci

ch-wan · 2026-05-19T03:26:39Z

/rerun-test test_deepseek_v4_flash_fp4_b200.py

github-actions · 2026-05-19T03:27:02Z

🚀 4-gpu-b200 (1 test): ❌ View workflow run

cd test/ && python3 registered/dsv4/test_deepseek_v4_flash_fp4_b200.py

yhyang201 · 2026-05-19T03:27:16Z

/rerun-test test_deepseek_v4_flash_fp4_b200.py

github-actions · 2026-05-19T03:27:45Z

🚀 4-gpu-b200 (1 test): ✅ View workflow run

cd test/ && python3 registered/dsv4/test_deepseek_v4_flash_fp4_b200.py

yhyang201 · 2026-05-19T03:56:41Z

/rerun-test test_deepseek_v4_flash_fp4_megamoe_b200.py

github-actions · 2026-05-19T03:57:11Z

🚀 4-gpu-b200 (1 test): ✅ View workflow run

cd test/ && python3 registered/dsv4/test_deepseek_v4_flash_fp4_megamoe_b200.py

… converting fp8_einsum input scale to ue8m0 (#25733) (#26063) Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>

…e to ue8m0 (sgl-project#25733)

github-actions Bot added the deepseek label May 19, 2026

github-actions Bot added the run-ci label May 19, 2026

ch-wan approved these changes May 19, 2026

View reviewed changes

Fridge003 merged commit 79ea30d into main May 19, 2026
183 of 214 checks passed

Fridge003 deleted the fix/fp8-einsum-ue8m0-scale branch May 19, 2026 06:48

b8zhong mentioned this pull request May 19, 2026

[Bug] DeepSeek-V4-Pro NVFP4 on B200 single-node: non-speculative decode produces NaN (TP=8) or garbage tokens (DP+DeepEP); only EAGLE works #25704

Closed

5 tasks

Kangyan-Zhou mentioned this pull request May 22, 2026

Add sglang-cherrypick skill for batching bot-cherry-pick dispatches #26068

Merged

3 tasks

Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026

[Bug] Fix V4-Pro NaN on Blackwell by converting fp8_einsum input scal…

70a991a

…e to ue8m0 (sgl-project#25733)

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026

[Bug] Fix V4-Pro NaN on Blackwell by converting fp8_einsum input scal…

4a5b2b2

…e to ue8m0 (sgl-project#25733)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Fix V4-Pro NaN on Blackwell by converting fp8_einsum input scale to ue8m0#25733

[Bug] Fix V4-Pro NaN on Blackwell by converting fp8_einsum input scale to ue8m0#25733
Fridge003 merged 1 commit into
mainfrom
fix/fp8-einsum-ue8m0-scale

yhyang201 commented May 19, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

ch-wan commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yhyang201 commented May 19, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CI States

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

ch-wan commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yhyang201 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yhyang201 commented May 19, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading