Skip to content

[CI/Build] Updated rmsnorm test to improve OOT device coverage#36246

Open
romitjain wants to merge 7 commits intovllm-project:mainfrom
romitjain:romit/rmsnorm-test
Open

[CI/Build] Updated rmsnorm test to improve OOT device coverage#36246
romitjain wants to merge 7 commits intovllm-project:mainfrom
romitjain:romit/rmsnorm-test

Conversation

@romitjain
Copy link
Copy Markdown

@romitjain romitjain commented Mar 6, 2026

Purpose

The purpose of this PR is to update the RMSNorm test (test_rms_norm) to make it more generic across devices. Specifically, I have updated the device parameterization for the test to be CPU as the default. This enables OOT hardware plugins to also run the same test. The PR uses forward_static as the reference implementation instead of forward_native. forward_static is a staticmethod; hence, that should be used as the gold standard response.

Test Plan

This is an updated test, so no new tests are required

  • pytest tests/kernels/core/test_layernorm.py::test_rms_norm on CPU installation
  • pytest tests/kernels/core/test_layernorm.py::test_rms_norm on CUDA installation

Test Result

The same test runs fine for CUDA devices, as well as for CPU devices

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Edit: I have added a RFC for broader changes for similar ops for tests

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve test coverage for rmsnorm across different devices, which is a valuable enhancement. However, I've identified two critical issues in the current implementation. Firstly, the logic for setting up test devices incorrectly prevents tests from running on the CPU when a CUDA device is available and also causes a NameError that breaks another test. Secondly, tensors are not correctly moved to the target device within test_rms_norm, which will result in runtime errors. I have provided detailed comments and code suggestions to address these critical bugs.

Comment thread tests/kernels/core/test_layernorm.py Outdated
Comment on lines +17 to +19
DEVICES = ["cpu"]
if torch.cuda.is_available():
DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change introduces two issues:

  1. When CUDA is available, the DEVICES list is overwritten to only contain CUDA devices, losing the 'cpu' device. This means the test will not run on CPU if a CUDA device is present, which seems contrary to the goal of improving OOT device coverage.
  2. The variable CUDA_DEVICES is removed, but it is still used by test_fused_rms_norm_quant on line 88, which will cause a NameError and break the test suite.

A better approach would be to extend the DEVICES list and re-introduce CUDA_DEVICES for the other test.

Suggested change
DEVICES = ["cpu"]
if torch.cuda.is_available():
DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)]
DEVICES = ["cpu"]
CUDA_DEVICES = []
if torch.cuda.is_available():
CUDA_DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)]
DEVICES.extend(CUDA_DEVICES)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, this test was only running on CUDA devices. I have made the update to be in line with the earlier implementation.

Comment thread tests/kernels/core/test_layernorm.py Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 6, 2026

Hi @romitjain, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
@romitjain romitjain force-pushed the romit/rmsnorm-test branch from f2e4201 to a655e21 Compare March 6, 2026 10:25
Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Author

Hi @mgoin @tlrmchlsmth
Hoping to get a review on this from you.

Thanks

# NOTE(woosuk): The reference implementation should be executed first
# because the custom kernel is in-place.
ref_out = layer.forward_native(x, residual)
ref_out = layer.forward_static(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't keep forward_native here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jikunshang We can, this was just an opiniated approach.
forward_native is directly calling the forward_static.

Just from the nomenclature I infered:

  • forward_native can be native to OOT platform
  • forward_static can remain the golden reference implementation - not to be inherited or over riden

@ProExpertProg
Copy link
Copy Markdown
Collaborator

I think this work is somewhat redundant with the vllm ir work, can you take a look at #33825? Also @gmagogsfm will be working on some test infra for IR ops that should hopefully cover this. After that these tests will either be redundant or should just focus on testing layer logic.

@romitjain
Copy link
Copy Markdown
Author

@ProExpertProg IIUC, your PR #33825 is working on adding vllm IR, which improves the lowering and dispatch of the kernels. But we would still be required to test the forward of the layers, irrespective of which kernel gets dispatched.
My PR is just reworking the layer logic test and removing CUDA dependency from the test.
Let me know if I understood it correctly?

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @romitjain.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 12, 2026
Signed-off-by: r0 <11757603+romitjain@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Mar 13, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 13, 2026

Hi @romitjain, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Author

Hi @mgoin @ProExpertProg
Hoping to get a review on this.

Thanks

joerunde added a commit to torch-spyre/sendnn-inference that referenced this pull request Mar 19, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

torch-spyre/sendnn-inference#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

torch-spyre/sendnn-inference#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

torch-spyre/sendnn-inference#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
sducouedic added a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

torch-spyre/sendnn-inference#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants