[Fix] Cast weight dtype in sgl-kernel norm wrappers for flashinfer 0.6.7 by Fridge003 · Pull Request #21645 · sgl-project/sglang

Fridge003 · 2026-03-29T23:43:35Z

Summary

Flashinfer 0.6.7 switched its rmsnorm implementation to CuTe-based kernels via TVM FFI, which enforce strict dtype matching between input and weight tensors
When RMSNorm weight is fp32 but input is bf16/fp16 (common in diffusion models), the new kernels raise ValueError: Mismatched Tensor on argument #1
This fix casts weight to match input dtype before calling flashinfer norm functions in all 4 affected wrappers: rmsnorm, fused_add_rmsnorm, gemma_rmsnorm, gemma_fused_add_rmsnorm

Fixes the CI failure in PR #21422 (bench_fused_norm_scale_shift.py).

Test plan

Reproduced the bug on H200 with flashinfer 0.6.7
Verified fix resolves the dtype mismatch error
Verified numerical correctness (exact match when using same-dtype reference)
CI should pass stage-b-kernel-benchmark-1-gpu-large suite

🤖 Generated with Claude Code

…6.7 compatibility Flashinfer 0.6.7 switched to CuTe-based kernels with stricter dtype validation for rmsnorm. When weight (fp32) and input (bf16/fp16) dtypes mismatch, the new kernels raise ValueError. Cast weight to input dtype before calling flashinfer norm functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces automatic dtype casting for the weight tensor in the rmsnorm, fused_add_rmsnorm, gemma_rmsnorm, and gemma_fused_add_rmsnorm functions to ensure it matches the input tensor's dtype before calling flashinfer kernels. A review comment suggests refactoring this repeated casting logic into a helper function to reduce code duplication and improve maintainability.

gemini-code-assist · 2026-03-29T23:44:51Z

sgl-kernel/python/sgl_kernel/elementwise.py

+        if weight.dtype != input.dtype:
+            weight = weight.to(input.dtype)


This dtype casting logic is repeated in fused_add_rmsnorm, gemma_rmsnorm, and gemma_fused_add_rmsnorm. To improve maintainability and reduce code duplication, you could extract this logic into a helper function.

First, define the helper function, for example at the top of the file:

def _maybe_cast_weight(weight: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor: """Casts weight to match input's dtype if they differ.""" if weight.dtype != input_tensor.dtype: return weight.to(input_tensor.dtype) return weight

Then, you can use this helper in the four norm functions. The suggestion below shows how to apply it for rmsnorm. The same change can be applied to the other three functions.

weight = _maybe_cast_weight(weight, input)

Fridge003 · 2026-03-30T05:45:15Z

Cherry-picked by #21422

Fridge003 requested review from BBuf, FlamingoPg, HaiShaw, ispobock, merrymercy and yizhang2077 as code owners March 29, 2026 23:43

github-actions bot added the sgl-kernel label Mar 29, 2026

gemini-code-assist bot reviewed Mar 29, 2026

View reviewed changes

Fridge003 closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Cast weight dtype in sgl-kernel norm wrappers for flashinfer 0.6.7#21645

[Fix] Cast weight dtype in sgl-kernel norm wrappers for flashinfer 0.6.7#21645
Fridge003 wants to merge 1 commit intomainfrom
fix/flashinfer-rmsnorm-dtype-cast

Fridge003 commented Mar 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 29, 2026

Uh oh!

Fridge003 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if weight.dtype != input.dtype:
		weight = weight.to(input.dtype)

Conversation

Fridge003 commented Mar 29, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Fridge003 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant