Skip to content

fix: restore FP4 deepgemm path for Blackwell broken by #23686#23948

Merged
Fridge003 merged 2 commits into
sgl-project:deepseek_v4from
yhyang201:fix/fp4-blackwell-deepgemm
Apr 28, 2026
Merged

fix: restore FP4 deepgemm path for Blackwell broken by #23686#23948
Fridge003 merged 2 commits into
sgl-project:deepseek_v4from
yhyang201:fix/fp4-blackwell-deepgemm

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

Summary

  • PR Deepseek_v4 support w4(mxfp4)a16 on hopper #23686 (Hopper w4a16 support) restructured FP4 expert weight processing and accidentally blocked the default deepgemm/auto backend path with a NotImplementedError on Blackwell GPUs.
  • This restores the original logic: runs unconditionally for all FP4 backends, marlin is a special-case pass-through, and the deepgemm/mega_moe path remains in the else branch unchanged.

Test plan

  • Verified on 8x NVIDIA B300 SXM6 AC (275GB) with DeepSeek-V4-Pro
  • Server starts successfully with and default runner backend
  • Benchmark (conc=32, ISL=8k, OSL=1k) passes without crash



The Hopper w4a16 PR (sgl-project#23686) restructured the FP4 expert weight
processing branches in a way that blocks the default deepgemm/auto
backend path with a NotImplementedError. This restores the original
logic and treats marlin as the special-case addition.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the weight processing logic for FP4 experts in fp8.py by consolidating the conversion of weight data to torch.int8 and simplifying the conditional logic for MoE runner backends. I have no feedback to provide as there were no review comments.

layer.w13_weight.data = layer.w13_weight.data.view(torch.int8)
layer.w2_weight.data = layer.w2_weight.data.view(torch.int8)

if get_moe_runner_backend().is_marlin():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the pass branch here. just comment here if you want to notice marlin.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add comments to clarify the three moe backends we supported and which branch they should go.

@yhyang201 yhyang201 force-pushed the fix/fp4-blackwell-deepgemm branch from 018b03b to 864a61a Compare April 28, 2026 17:42
@yhyang201 yhyang201 force-pushed the fix/fp4-blackwell-deepgemm branch from 864a61a to d3e63aa Compare April 28, 2026 18:06
@Fridge003 Fridge003 merged commit 79281e8 into sgl-project:deepseek_v4 Apr 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants