[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel#42201
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Triton kernels in fp8_utils.py to use int64 for offset calculations to prevent potential integer overflows. The review feedback correctly points out that casting to int64 after the multiplication is insufficient, as the intermediate 32-bit product could still overflow. The reviewer suggests casting the program IDs to int64 before the multiplication to ensure robust overflow protection, consistent with other kernels in the codebase.
23fc557 to
ac90f94
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the Triton kernels in fp8_utils.py to cast program IDs to int64 before calculating memory offsets. This change prevents potential integer overflow issues during offset computation in large-scale operations. As there were no review comments provided, I have no feedback to provide.
|
Hi @Flink-ddd, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work! Also CC @ivanium
|
LGTM too. Thanks for the fix! cc @zyongye as well |
…_group_quant_fp8_colmajor to fix int32 overflow for large DeepGEMM MoE warmup shapes Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vensen <vensenmu@gmail.com> Signed-off-by: vensen <vensenmu@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vensen <vensenmu@gmail.com> Signed-off-by: vensen <vensenmu@gmail.com>
ac90f94 to
d0729a2
Compare
|
Hi @yewentao256 @ivanium @zyongye , All 69 CI checks are passed, ready for merge, Thanks! |
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Purpose
Fixes #42173
_silu_mul_per_token_group_quant_fp8_colmajor computes row/column offsets using int32 arithmetic:
With large DeepGEMM MoE warmup/workspace shapes (e.g. DPEP=16, 36k max tokens per rank), the maximum element offset
M * N - 1 = 18,882,756,607far exceeds the int32 limit of2,147,483,647, causing the Triton kernel to access illegal memory addresses.This PR promotes m_offset and n_offset to tl.int64 before pointer arithmetic to ensure correct 64-bit memory addressing.
Test Plan
Test Result
Before fix:
After fix (testing in progress on H100 PCIe):