Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… by b8zhong · Pull Request #22047 · sgl-project/sglang

b8zhong · 2026-04-03T13:42:19Z

…) (#19652)"

This reverts commit 991f3aa.

Firstly it introduces merge conflicts through at least 2 places (I haven't checked the kernel code, only python. thus there could be more), so I'm uncertain if it has problem in other places...

#19652)" This reverts commit 991f3aa.

gemini-code-assist

Code Review

This pull request removes the Marlin FP4 fallback support for non-Blackwell GPUs, raising the minimum GPU capability for NVFP4 quantization to SM100. The changes involve deleting fallback utility files, environment variables, and tests, while updating Marlin kernels to adjust scale strides and group ID calculations for native FP4 support. Review feedback pointed out potential division-by-zero or negative value issues in the kernel templates when group_blocks is -1 and recommended removing commented-out code to improve maintainability.

gemini-code-assist · 2026-04-03T13:45:31Z

python/sglang/jit_kernel/csrc/gemm/marlin_moe/marlin_template.h


          int k_blocks = cur_k / 16;
-          int cur_group_id = k_blocks / group_blocks;
+          int cur_group_id = k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1));


Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.

int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1)) : 0;

gemini-code-assist · 2026-04-03T13:45:31Z

sgl-kernel/csrc/gemm/marlin/marlin_template.h


          int k_blocks = cur_k / 16;
-          int cur_group_id = k_blocks / group_blocks;
+          int cur_group_id = k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1));


Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.

int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1)) : 0;

gemini-code-assist · 2026-04-03T13:45:31Z

python/sglang/jit_kernel/csrc/gemm/marlin_moe/marlin_template.h

+    // if constexpr (w_type == host::kFE2M1f) {
+    //   int s_quant_0 = reinterpret_cast<int*>(frag_s[k2])[0];
+    //   int s_quant_1 = reinterpret_cast<int*>(frag_s[k2])[1];
+    //
+    //   dequant_fp8_scales<scalar_t2, s_type_id>(
+    //       s_quant_0, reinterpret_cast<scalar_t2*>(&frag_s[k2]));
+    //   dequant_fp8_scales<scalar_t2, s_type_id>(
+    //       s_quant_1, reinterpret_cast<scalar_t2*>(&frag_s[k2]) + 2);
+    // }


Commented-out code should be removed to maintain code cleanliness. If the intention is to disable kFE2M1f kernel generation to save compilation time, it should also be removed from the registration in moe_wna16_marlin.cuh. Otherwise, if the kernel is still used, this logic is required for numerical correctness.

sgl-project#22047)

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (

16b8864

#19652)" This reverts commit 991f3aa.

b8zhong requested review from AniZpZ, BBuf, DarkSharpness, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, HydraQYH, Ying1123, celve, ch-wan, ispobock, merrymercy, yizhang2077 and yuan-luo as code owners April 3, 2026 13:42

github-actions bot added documentation Improvements or additions to documentation quant LLM Quantization sgl-kernel blackwell SM100/SM120 jit-kernel labels Apr 3, 2026

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

b8zhong requested a review from Kangyan-Zhou April 3, 2026 18:00

b8zhong assigned Kangyan-Zhou Apr 3, 2026

Fridge003 merged commit 6aafe75 into main Apr 3, 2026
59 of 67 checks passed

Fridge003 deleted the revert-19652-nvfp4-marlin-fallback branch April 3, 2026 20:12

adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Apr 3, 2026

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (

f361604

sgl-project#22047)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047
Fridge003 merged 1 commit intomainfrom
revert-19652-nvfp4-marlin-fallback

b8zhong commented Apr 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

gemini-code-assist bot Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

b8zhong commented Apr 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants