add contiguous inside rmsnorm kernel by jikunshang · Pull Request #95 · vllm-project/vllm-xpu-kernels

jikunshang · 2025-12-12T05:12:32Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

add contiguous inside rmsnorm kernel, vllm also add this in kernel in vllm-project/vllm#28103, and remove contiguous call in python level, this may bring potential accuracy issue.

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Copilot

Pull request overview

This PR adds input tensor contiguity checks and enforcement directly within the RMSnorm kernel implementation, aligning with a similar change made in vLLM. The change moves contiguity handling from Python to C++ level to prevent potential accuracy issues.

Key Changes:

Adds contiguity validation for output and weight tensors
Implements automatic conversion to contiguous layout for input tensor when stride is not 1
Adds runtime checks to ensure proper tensor layouts before kernel execution

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T05:13:14Z

csrc/layernorm.cpp

+  if (input.stride(-1) != 1) {
+    input = input.contiguous();
+  }
+  TORCH_CHECK(input.stride(-1) == 1);


The check at line 211 is redundant. After the contiguous() call on line 209, the stride(-1) is guaranteed to be 1. This assertion will never fail and can be removed.

Suggested change

TORCH_CHECK(input.stride(-1) == 1);

Copilot · 2025-12-12T05:13:15Z

csrc/layernorm.cpp

+  if (input.stride(-1) != 1) {
+    input = input.contiguous();
+  }
+  TORCH_CHECK(input.stride(-1) == 1);
+  TORCH_CHECK(weight.is_contiguous());
  VLLM_DISPATCH_FLOATING_TYPES(
      input.scalar_type(), "call_rms_norm_kernel", [&] {
        vllm::call_rms_norm_kernel<scalar_t>(out, input, weight, epsilon);


Modifying the input parameter silently may lead to unexpected behavior for callers. Consider either making input non-const to signal this modification is possible, or document this behavior clearly, or make a copy to avoid mutating the caller's reference.

Suggested change

if (input.stride(-1) != 1) {

input = input.contiguous();

}

TORCH_CHECK(input.stride(-1) == 1);

TORCH_CHECK(weight.is_contiguous());

VLLM_DISPATCH_FLOATING_TYPES(

input.scalar_type(), "call_rms_norm_kernel", [&] {

vllm::call_rms_norm_kernel<scalar_t>(out, input, weight, epsilon);

auto input_ = input;

if (input_.stride(-1) != 1) {

input_ = input_.contiguous();

}

TORCH_CHECK(input_.stride(-1) == 1);

TORCH_CHECK(weight.is_contiguous());

VLLM_DISPATCH_FLOATING_TYPES(

input_.scalar_type(), "call_rms_norm_kernel", [&] {

vllm::call_rms_norm_kernel<scalar_t>(out, input_, weight, epsilon);

mayuyuace · 2025-12-12T05:23:53Z

Cuda may meet the same problem.
If possible, rms_norm kernel can handle the situation that last demission of input is not contiguous in the future.
Anyway, this PR LGTM.

baodii

LGTM

add contiguous inside rmsnorm kernel

189353a

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Copilot AI review requested due to automatic review settings December 12, 2025 05:12

Copilot AI reviewed Dec 12, 2025

View reviewed changes

mayuyuace approved these changes Dec 12, 2025

View reviewed changes

baodii approved these changes Dec 16, 2025

View reviewed changes

jikunshang merged commit dd579a1 into vllm-project:main Dec 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add contiguous inside rmsnorm kernel#95

add contiguous inside rmsnorm kernel#95
jikunshang merged 1 commit intovllm-project:mainfrom
jikunshang:kunshang/rms_norm_contiguous

jikunshang commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

mayuyuace commented Dec 12, 2025

Uh oh!

baodii left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jikunshang commented Dec 12, 2025

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

mayuyuace commented Dec 12, 2025

Uh oh!

baodii left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants