[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel by Flink-ddd · Pull Request #42201 · vllm-project/vllm

Flink-ddd · 2026-05-10T03:49:00Z

Purpose

_silu_mul_per_token_group_quant_fp8_colmajor computes row/column offsets using int32 arithmetic:

m_offset = pid_m * BLOCK_M
n_offset = pid_n * BLOCK_N

With large DeepGEMM MoE warmup/workspace shapes (e.g. DPEP=16, 36k max tokens per rank), the maximum element offset M * N - 1 = 18,882,756,607 far exceeds the int32 limit of 2,147,483,647, causing the Triton kernel to access illegal memory addresses.

This PR promotes m_offset and n_offset to tl.int64 before pointer arithmetic to ensure correct 64-bit memory addressing.

Test Plan

Verified on NVIDIA H100 PCIe (80GB) using a minimal single-GPU reproducer with the first aligned overflow shape:

M = 524_416, N = 4096
max_offset = M * N - 1 = 2,148,007,935 (exceeds int32 max)
Environment: vllm 0.19.0, torch 2.10.0+cu128, triton 3.6.0, cuda 12.8

Reproduction script:

import torch
from vllm.model_executor.layers.quantization.utils.fp8_utils import (
    silu_mul_per_token_group_quant_fp8_colmajor,
)

# Single-card minimum overflow shape
M = 524_416
N = 4096

print(f"M={M}, N={N}")
print(f"max_offset = {M*N-1}")
print(f"int32_max  = {2**31-1}")
print(f"overflow?  = {M*N-1 > 2**31-1}")
print()

x = torch.empty((M, N), device="cuda", dtype=torch.bfloat16)
torch.cuda.synchronize()

print("Calling silu_mul_per_token_group_quant_fp8_colmajor ...")
y, scales = silu_mul_per_token_group_quant_fp8_colmajor(x, use_ue8m0=False)
torch.cuda.synchronize()

print(f"successful！output shape={y.shape}, scales shape={scales.shape}")

Test Result

Before fix:

M=524416, N=4096
max_offset = 2148007935
int32_max  = 2147483647
overflow?  = True
Calling silu_mul_per_token_group_quant_fp8_colmajor ...
File ".../fp8_utils.py", line 785, in silu_mul_per_token_group_quant_fp8_colmajor
_silu_mul_per_token_group_quant_fp8_colmajor[grid](https://github.com/vllm-project/vllm/compare/main...Flink-ddd:vllm:fix/...)
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

After fix (testing in progress on H100 PCIe):

M=524416, N=4096
max_offset = 2148007935
int32_max  = 2147483647
overflow?  = True
Calling silu_mul_per_token_group_quant_fp8_colmajor ...
successful！output shape=torch.Size([524416, 2048]), scales shape=torch.Size([524416, 16])

gemini-code-assist

Code Review

This pull request updates the Triton kernels in fp8_utils.py to use int64 for offset calculations to prevent potential integer overflows. The review feedback correctly points out that casting to int64 after the multiplication is insufficient, as the intermediate 32-bit product could still overflow. The reviewer suggests casting the program IDs to int64 before the multiplication to ensure robust overflow protection, consistent with other kernels in the codebase.

Flink-ddd · 2026-05-10T04:36:13Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the Triton kernels in fp8_utils.py to cast program IDs to int64 before calculating memory offsets. This change prevents potential integer overflow issues during offset computation in large-scale operations. As there were no review comments provided, I have no feedback to provide.

Flink-ddd · 2026-05-10T04:41:36Z

Pre-commit failures are seem like pre-existing main branch issues unrelated to this PR. all checks pass for the modified file through pre-commit run --files vllm/model_executor/layers/quantization/utils/fp8_utils.py.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-05-10T04:42:33Z

Hi @Flink-ddd, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

yewentao256

LGTM, thanks for the work! Also CC @ivanium

ivanium · 2026-05-10T21:25:04Z

LGTM too. Thanks for the fix! cc @zyongye as well

…_group_quant_fp8_colmajor to fix int32 overflow for large DeepGEMM MoE warmup shapes Signed-off-by: vensen <vensenmu@gmail.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vensen <vensenmu@gmail.com> Signed-off-by: vensen <vensenmu@gmail.com>

zyongye

LGTM

Flink-ddd · 2026-05-11T17:50:52Z

Hi @yewentao256 @ivanium @zyongye , All 69 CI checks are passed, ready for merge, Thanks!

…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

mergify Bot added the bug Something isn't working label May 10, 2026

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated

Comment thread vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated

Flink-ddd changed the title ~~[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 colmajor Triton kernel for large MoE warmup shapes~~ [Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel May 10, 2026

Flink-ddd force-pushed the fix/deepgemm-silu-fp8-int32-overflow branch from 23fc557 to ac90f94 Compare May 10, 2026 04:26

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Flink-ddd marked this pull request as ready for review May 10, 2026 04:41

Flink-ddd requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners May 10, 2026 04:41

claude Bot reviewed May 10, 2026

View reviewed changes

yewentao256 approved these changes May 10, 2026

View reviewed changes

Flink-ddd and others added 3 commits May 11, 2026 10:57

fix(fp8): upcast m_offset/n_offset to tl.int64 in _silu_mul_per_token…

c4ad401

…_group_quant_fp8_colmajor to fix int32 overflow for large DeepGEMM MoE warmup shapes Signed-off-by: vensen <vensenmu@gmail.com>

Update vllm/model_executor/layers/quantization/utils/fp8_utils.py

4030b06

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vensen <vensenmu@gmail.com> Signed-off-by: vensen <vensenmu@gmail.com>

Update vllm/model_executor/layers/quantization/utils/fp8_utils.py

d0729a2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Vensen <vensenmu@gmail.com> Signed-off-by: vensen <vensenmu@gmail.com>

Flink-ddd force-pushed the fix/deepgemm-silu-fp8-int32-overflow branch from ac90f94 to d0729a2 Compare May 11, 2026 03:37

Flink-ddd requested a review from zyongye as a code owner May 11, 2026 03:37

zyongye added the ready ONLY add when PR is ready to merge/full CI is needed label May 11, 2026

zyongye approved these changes May 11, 2026

View reviewed changes

yewentao256 approved these changes May 11, 2026

View reviewed changes

yewentao256 merged commit 6fdb493 into vllm-project:main May 11, 2026
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel#42201

[Bugfix] Fix int32 overflow in DeepGEMM SiLU/mul FP8 Triton kernel#42201
yewentao256 merged 3 commits into
vllm-project:mainfrom
Flink-ddd:fix/deepgemm-silu-fp8-int32-overflow

Flink-ddd commented May 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Flink-ddd commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Flink-ddd commented May 10, 2026

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented May 10, 2026

Uh oh!

yewentao256 left a comment

Uh oh!

ivanium commented May 10, 2026

Uh oh!

zyongye left a comment

Uh oh!

Flink-ddd commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Flink-ddd commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Flink-ddd commented May 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Flink-ddd commented May 10, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented May 10, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

ivanium commented May 10, 2026

Uh oh!

zyongye left a comment

Choose a reason for hiding this comment

Uh oh!

Flink-ddd commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Flink-ddd commented May 10, 2026 •

edited

Loading