Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047
Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+…#22047
Conversation
There was a problem hiding this comment.
Code Review
This pull request removes the Marlin FP4 fallback support for non-Blackwell GPUs, raising the minimum GPU capability for NVFP4 quantization to SM100. The changes involve deleting fallback utility files, environment variables, and tests, while updating Marlin kernels to adjust scale strides and group ID calculations for native FP4 support. Review feedback pointed out potential division-by-zero or negative value issues in the kernel templates when group_blocks is -1 and recommended removing commented-out code to improve maintainability.
|
|
||
| int k_blocks = cur_k / 16; | ||
| int cur_group_id = k_blocks / group_blocks; | ||
| int cur_group_id = k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1)); |
There was a problem hiding this comment.
Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.
int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == host::kFE2M1f ? 2 : 1)) : 0;|
|
||
| int k_blocks = cur_k / 16; | ||
| int cur_group_id = k_blocks / group_blocks; | ||
| int cur_group_id = k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1)); |
There was a problem hiding this comment.
Potential division by a negative value if group_blocks is -1. The surrounding if constexpr block allows entry if has_zp is true even if group_blocks is -1. In such a case, cur_group_id will be calculated using a negative divisor, leading to incorrect memory access. It's safer to guard this calculation.
int cur_group_id = (group_blocks > 0) ? k_blocks / (group_blocks * (w_type == sglang::kFE2M1f ? 2 : 1)) : 0;| // if constexpr (w_type == host::kFE2M1f) { | ||
| // int s_quant_0 = reinterpret_cast<int*>(frag_s[k2])[0]; | ||
| // int s_quant_1 = reinterpret_cast<int*>(frag_s[k2])[1]; | ||
| // | ||
| // dequant_fp8_scales<scalar_t2, s_type_id>( | ||
| // s_quant_0, reinterpret_cast<scalar_t2*>(&frag_s[k2])); | ||
| // dequant_fp8_scales<scalar_t2, s_type_id>( | ||
| // s_quant_1, reinterpret_cast<scalar_t2*>(&frag_s[k2]) + 2); | ||
| // } |
There was a problem hiding this comment.
Commented-out code should be removed to maintain code cleanliness. If the intention is to disable kFE2M1f kernel generation to save compilation time, it should also be removed from the registration in moe_wna16_marlin.cuh. Otherwise, if the kernel is still used, this logic is required for numerical correctness.
…) (#19652)"
This reverts commit 991f3aa.
Firstly it introduces merge conflicts through at least 2 places (I haven't checked the kernel code, only python. thus there could be more), so I'm uncertain if it has problem in other places...