[Spyre-Next] Reworked rms_norm by bohnstingl · Pull Request #873 · torch-spyre/sendnn-inference

bohnstingl · 2026-03-27T10:49:44Z

Description

This PR removes the transpose + .contiguous() operation from rms_norm, making it even closer to the native upstream implementation. However, I currently observe small numerical differences and I prepared a repro script for that.

import torch

x = torch.randn(1024, 4096, device="cpu", dtype=torch.float16)
hidden_size = 4096
eps = 1e-05
weight = torch.randn(4096, device="cpu", dtype=torch.float16)

def rms_old(x, hidden_size, variance_epsilon, weight):
    if x.shape[-1] != hidden_size:
        raise ValueError(f"Expected hidden_size to be {hidden_size}, but found: {x.shape[-1]}")

    x = x.transpose(-1, -2).contiguous()

    variance_epsilon = torch.full(
        x.shape, variance_epsilon, dtype=torch.float16, device=x.device
    )

    x_var = x

    # After transpose, hidden dim is now dim=0
    variance = x_var.pow(2).mean(dim=0, keepdim=True)
    # variance = x_var.pow(2).mean(dim=-1, keepdim=True)

    x = x * torch.rsqrt(variance + variance_epsilon)
    x = x.transpose(-1, -2).contiguous()

    if weight is not None:
        x = x * weight
    return x

def rms_new(x, hidden_size, variance_epsilon, weight):
    if x.shape[-1] != hidden_size:
        raise ValueError(f"Expected hidden_size to be {hidden_size}, but found: {x.shape[-1]}")


    variance_epsilon = torch.full(
        x.shape, variance_epsilon, dtype=torch.float16, device=x.device
    )

    x_var = x

    variance = x_var.pow(2).mean(dim=-1, keepdim=True)

    x = x * torch.rsqrt(variance + variance_epsilon)

    if weight is not None:
        x = x * weight
    return x

x = x.to("spyre")
weight = weight.to("spyre")

out1 = torch.compile(rms_old, dynamic=False)(x, hidden_size, eps, weight).cpu()
out2 = torch.compile(rms_new, dynamic=False)(x, hidden_size, eps, weight).cpu()

torch.testing.assert_close(out1, out2, atol=0.001, rtol=0.001)

print('Tensors are close')

cc @romitjain

Related Issues

Corresponding change in torch-spyre:
torch-spyre/torch-spyre#1236.

Test Plan

Change is non user-facing and all existing tests should pass

Checklist

I have read the contributing guidelines
My code follows the project's code style (run bash format.sh)
I have added tests for my changes (if applicable)
I have updated the documentation (if applicable)
My commits include a Signed-off-by: line (DCO compliance)

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

github-actions · 2026-03-27T11:16:15Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

rafvasq · 2026-03-27T13:30:54Z

bot:next-test

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

tjohnson31415 · 2026-03-27T16:56:46Z

How significant are the numerical differences observed, and which method is closer to a baseline non-spyre implementation?

The simplification seems good but maybe the transpose().contiguous() serves a purpose; I've seen another case where a double transpose was needed for Spyre (REF)

yannicks1

lgtm code wise - fair point of @tjohnson31415 above ^^

bohnstingl · 2026-03-27T17:31:23Z

@tjohnson31415 yes, indeed. It is a fair point and maybe @romitjain can comment also here.

The case that you reference is a bit specific for attention I believe. There was a view operation done before and then the tensor needed to be re-stickified via this trick. We also currently use it in our attention implementation, see https://github.com/jvlunteren/vllm-spyre/blob/80c7cc9e0c261059375780fc24cb3cd9861d3030/vllm_spyre_next/vllm_spyre_next/v1/attention/backends/spyre_attn.py#L698-L701.
For our case, this problem shouldn't exist and torch-spyre has also removed the transposition + contiguous() operation, see torch-spyre/torch-spyre#1236.

However, one does observe some numerical difference when comparing the old and the new approach. In particular, I created this small repro script, which will fail for the 0.001 testcase:

import torch

x = torch.randn(1024, 4096, device="cpu", dtype=torch.float16)
hidden_size = 4096
eps = 1e-05
weight = torch.randn(4096, device="cpu", dtype=torch.float16)

def rms_old(x, hidden_size, variance_epsilon, weight):
    x = x.transpose(-1, -2).contiguous()

    variance_epsilon = torch.full(
        x.shape, variance_epsilon, dtype=torch.float16, device=x.device
    )

    # After transpose, hidden dim is now dim=0
    variance = x.pow(2).mean(dim=0, keepdim=True)

    x = x * torch.rsqrt(variance + variance_epsilon)
    x = x.transpose(-1, -2).contiguous()

    if weight is not None:
        x = x * weight
    return x

def rms_new(x, hidden_size, variance_epsilon, weight):
    variance_epsilon = torch.full(
        x.shape, variance_epsilon, dtype=torch.float16, device=x.device
    )

    variance = x.pow(2).mean(dim=-1, keepdim=True)

    x = x * torch.rsqrt(variance + variance_epsilon)

    if weight is not None:
        x = x * weight
    return x

x = x.to("spyre")
weight = weight.to("spyre")

out1 = torch.compile(rms_old, dynamic=False)(x, hidden_size, eps, weight).cpu()
out2 = torch.compile(rms_new, dynamic=False)(x, hidden_size, eps, weight).cpu()

torch.testing.assert_close(out1, out2, atol=0.1, rtol=0.1, msg="FALED with atol/rtol 0.1")
torch.testing.assert_close(out1, out2, atol=0.01, rtol=0.01, msg="FALED with atol/rtol 0.01")
torch.testing.assert_close(out1, out2, atol=0.001, rtol=0.001, msg="FALED with atol/rtol 0.001")

I had an offline communication with @romitjain and I think this was verified.

tjohnson31415 · 2026-03-27T17:44:54Z

Ah, if we are aligining with changes in torch-spyre then I think a little numerical change is ok (When initially running in my dev env, I was getting big changes, but something was misconfigured).

I added a link to torch-spyre/torch-spyre#1236 to the PR description.

tjohnson31415

LGTM

## Description Fix docstring inaccuracies, typos and typing. Changes: - cleans up docstrings after #873 - cleans up comment after #754 - typos and typing ## Test Plan Documentation-only changes, no functional impact. ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>

Removed transpose + contiguous()

d648ddc

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

bohnstingl self-assigned this Mar 27, 2026

bohnstingl marked this pull request as ready for review March 27, 2026 10:52

bohnstingl requested review from joerunde and prashantgupta24 as code owners March 27, 2026 10:52

github-actions bot changed the title ~~Reworked rms_norm~~ [Spyre-Next] Reworked rms_norm Mar 27, 2026

bohnstingl requested review from rafvasq, tjohnson31415 and yannicks1 March 27, 2026 15:16

yannicks1 requested changes Mar 27, 2026

View reviewed changes

Comment thread vllm_spyre_next/vllm_spyre_next/custom_ops/rms_norm.py

Comment thread vllm_spyre_next/vllm_spyre_next/custom_ops/rms_norm.py Outdated

Removed variance_size_override path

3b4c4b6

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>

bohnstingl requested a review from yannicks1 March 27, 2026 16:29

yannicks1 approved these changes Mar 27, 2026

View reviewed changes

bohnstingl mentioned this pull request Mar 27, 2026

[Spyre-Next] Reworked forward call chain #872

Merged

5 tasks

tjohnson31415 approved these changes Mar 27, 2026

View reviewed changes

tjohnson31415 merged commit 3a0460d into torch-spyre:main Mar 27, 2026
14 checks passed

yannicks1 mentioned this pull request Mar 31, 2026

[Spyre-Next] 🎨 Fix docstring inaccuracies and typos #880

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spyre-Next] Reworked rms_norm#873

[Spyre-Next] Reworked rms_norm#873
tjohnson31415 merged 2 commits intotorch-spyre:mainfrom
bohnstingl:rms_rework

bohnstingl commented Mar 27, 2026 •

edited by tjohnson31415

Loading

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

rafvasq commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

tjohnson31415 commented Mar 27, 2026 •

edited

Loading

Uh oh!

yannicks1 left a comment •

edited

Loading

Uh oh!

bohnstingl commented Mar 27, 2026

Uh oh!

tjohnson31415 commented Mar 27, 2026

Uh oh!

tjohnson31415 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bohnstingl commented Mar 27, 2026 • edited by tjohnson31415 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Test Plan

Checklist

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

rafvasq commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

tjohnson31415 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yannicks1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bohnstingl commented Mar 27, 2026

Uh oh!

tjohnson31415 commented Mar 27, 2026

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bohnstingl commented Mar 27, 2026 •

edited by tjohnson31415

Loading

tjohnson31415 commented Mar 27, 2026 •

edited

Loading

yannicks1 left a comment •

edited

Loading