Skip to content

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047

Merged
Fridge003 merged 1 commit intomainfrom
revert-19652-nvfp4-marlin-fallback
Apr 3, 2026
Merged

Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047
Fridge003 merged 1 commit intomainfrom
revert-19652-nvfp4-marlin-fallback

Conversation

@b8zhong
Copy link
Copy Markdown
Collaborator

@b8zhong b8zhong commented Apr 3, 2026

…) (#19652)"

This reverts commit 991f3aa.

Firstly it introduces merge conflicts through at least 2 places (I haven't checked the kernel code, only python. thus there could be more), so I'm uncertain if it has problem in other places...

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the Marlin FP4 fallback support for non-Blackwell GPUs, raising the minimum GPU capability for NVFP4 quantization to SM100. The changes involve deleting fallback utility files, environment variables, and tests, while updating Marlin kernels to adjust scale strides and group ID calculations for native FP4 support. Review feedback pointed out potential division-by-zero or negative value issues in the kernel templates when group_blocks is -1 and recommended removing commented-out code to improve maintainability.


int k_blocks = cur_k / 16;
int cur_group_id = k_blocks / group_blocks;
int cur_group_id = k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.

          int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1)) : 0;


int k_blocks = cur_k / 16;
int cur_group_id = k_blocks / group_blocks;
int cur_group_id = k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.

          int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1)) : 0;

Comment on lines +1248 to +1256
// if constexpr (w_type == host::kFE2M1f) {
// int s_quant_0 = reinterpret_cast<int*>(frag_s[k2])[0];
// int s_quant_1 = reinterpret_cast<int*>(frag_s[k2])[1];
//
// dequant_fp8_scales<scalar_t2, s_type_id>(
// s_quant_0, reinterpret_cast<scalar_t2*>(&frag_s[k2]));
// dequant_fp8_scales<scalar_t2, s_type_id>(
// s_quant_1, reinterpret_cast<scalar_t2*>(&frag_s[k2]) + 2);
// }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Commented-out code should be removed to maintain code cleanliness. If the intention is to disable kFE2M1f kernel generation to save compilation time, it should also be removed from the registration in moe_wna16_marlin.cuh. Otherwise, if the kernel is still used, this logic is required for numerical correctness.

@b8zhong b8zhong requested a review from Kangyan-Zhou April 3, 2026 18:00
@Fridge003 Fridge003 merged commit 6aafe75 into main Apr 3, 2026
59 of 67 checks passed
@Fridge003 Fridge003 deleted the revert-19652-nvfp4-marlin-fallback branch April 3, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 documentation Improvements or additions to documentation jit-kernel quant LLM Quantization sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants