[CI/Build] Updated rmsnorm test to improve OOT device coverage#36246
[CI/Build] Updated rmsnorm test to improve OOT device coverage#36246romitjain wants to merge 7 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request aims to improve test coverage for rmsnorm across different devices, which is a valuable enhancement. However, I've identified two critical issues in the current implementation. Firstly, the logic for setting up test devices incorrectly prevents tests from running on the CPU when a CUDA device is available and also causes a NameError that breaks another test. Secondly, tensors are not correctly moved to the target device within test_rms_norm, which will result in runtime errors. I have provided detailed comments and code suggestions to address these critical bugs.
| DEVICES = ["cpu"] | ||
| if torch.cuda.is_available(): | ||
| DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)] |
There was a problem hiding this comment.
This change introduces two issues:
- When CUDA is available, the
DEVICESlist is overwritten to only contain CUDA devices, losing the'cpu'device. This means the test will not run on CPU if a CUDA device is present, which seems contrary to the goal of improving OOT device coverage. - The variable
CUDA_DEVICESis removed, but it is still used bytest_fused_rms_norm_quanton line 88, which will cause aNameErrorand break the test suite.
A better approach would be to extend the DEVICES list and re-introduce CUDA_DEVICES for the other test.
| DEVICES = ["cpu"] | |
| if torch.cuda.is_available(): | |
| DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)] | |
| DEVICES = ["cpu"] | |
| CUDA_DEVICES = [] | |
| if torch.cuda.is_available(): | |
| CUDA_DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)] | |
| DEVICES.extend(CUDA_DEVICES) |
There was a problem hiding this comment.
Earlier, this test was only running on CUDA devices. I have made the update to be in line with the earlier implementation.
|
Hi @romitjain, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
f2e4201 to
a655e21
Compare
Signed-off-by: romit <romit@ibm.com>
|
Hi @mgoin @tlrmchlsmth Thanks |
| # NOTE(woosuk): The reference implementation should be executed first | ||
| # because the custom kernel is in-place. | ||
| ref_out = layer.forward_native(x, residual) | ||
| ref_out = layer.forward_static( |
There was a problem hiding this comment.
why can't keep forward_native here?
There was a problem hiding this comment.
@jikunshang We can, this was just an opiniated approach.
forward_native is directly calling the forward_static.
Just from the nomenclature I infered:
forward_nativecan be native to OOT platformforward_staticcan remain the golden reference implementation - not to be inherited or over riden
|
I think this work is somewhat redundant with the vllm ir work, can you take a look at #33825? Also @gmagogsfm will be working on some test infra for IR ops that should hopefully cover this. After that these tests will either be redundant or should just focus on testing layer logic. |
|
@ProExpertProg IIUC, your PR #33825 is working on adding vllm IR, which improves the lowering and dispatch of the kernels. But we would still be required to test the forward of the layers, irrespective of which kernel gets dispatched. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: r0 <11757603+romitjain@users.noreply.github.com>
|
Hi @romitjain, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: romit <romit@ibm.com>
|
Hi @mgoin @ProExpertProg Thanks |
## Description This PR does 2 things: 1. **Adds tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** There are 2 tests added - a unit test verifying the correctness of the layer on CPU/Spyre and an integration test to ensure `forward_oot` gets called when it is installed as a vLLM plugin. While writing down these tests, I saw a couple of issues in the SpyreRMSNorm implementation - which I have attempted to fix, but please correct me if I am wrong. 2. **Adds a framework for running upstream vLLM tests and runs RMSNorm upstream tests** Building on: vllm-project/vllm#36246, this PR also adds a framework that can be used to filter and update upstream tests and run them from the `vllm` repo. 1. We clone vllm separately and run tests from the vllm repo (copied over from #800, not the contribution of this PR) 2. We manage the whitelist/filtering logic via a declarative YAML ## Related Issues <!-- Link related issues, e.g., `Fixes #` or `Relates to #456` --> #805 ## Test Plan <!-- Describe how you tested your changes. Include commands or steps to reproduce. --> To test both the features of this PR: 1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** ```bash cd vllm-spyre/vllm_spyre_next # Installs the pytest plugin uv pip install -e . VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre ``` This is expected to produce, ```bash ================================================================================= short test summary info ================================================================================= PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted> ========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ========================================================== ``` The tests are failing at boundaries of hidden dim, which is expected as of now, since hidden dim is not being padded to 64. (I can raise a separate PR to fix that, but I did not want to overload this PR) 2. **Upstream tests that run from vLLM** This makes use of my PR on vLLM: vllm-project/vllm#36246, which enables the RMSNorm test to run for OOT devices ```bash # For demonstration purposes, I am testing on my fork and commit export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696 export VLLM_REPO_URL=https://github.com/romitjain/vllm VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream ``` This is expected to produce ```bash ================================================================================= short test summary info ================================================================================= PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available. FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close! ========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ========================================================== ``` We can see that our YAML is being respected and: 1. Most of the upstream tests are skipped due to not being in our YAML 2. 12 tests are getting skipped for parameters not being supported (`param skipped`) 3. 12 tests are selected for running, out of whcih 6 pass/6 fail ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [x] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: romit <romit@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
## Description This PR does 2 things: 1. **Adds tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** There are 2 tests added - a unit test verifying the correctness of the layer on CPU/Spyre and an integration test to ensure `forward_oot` gets called when it is installed as a vLLM plugin. While writing down these tests, I saw a couple of issues in the SpyreRMSNorm implementation - which I have attempted to fix, but please correct me if I am wrong. 2. **Adds a framework for running upstream vLLM tests and runs RMSNorm upstream tests** Building on: vllm-project/vllm#36246, this PR also adds a framework that can be used to filter and update upstream tests and run them from the `vllm` repo. 1. We clone vllm separately and run tests from the vllm repo (copied over from #800, not the contribution of this PR) 2. We manage the whitelist/filtering logic via a declarative YAML ## Related Issues <!-- Link related issues, e.g., `Fixes #` or `Relates to #456` --> torch-spyre/sendnn-inference#805 ## Test Plan <!-- Describe how you tested your changes. Include commands or steps to reproduce. --> To test both the features of this PR: 1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** ```bash cd vllm-spyre/vllm_spyre_next # Installs the pytest plugin uv pip install -e . VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre ``` This is expected to produce, ```bash ================================================================================= short test summary info ================================================================================= PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted> ========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ========================================================== ``` The tests are failing at boundaries of hidden dim, which is expected as of now, since hidden dim is not being padded to 64. (I can raise a separate PR to fix that, but I did not want to overload this PR) 2. **Upstream tests that run from vLLM** This makes use of my PR on vLLM: vllm-project/vllm#36246, which enables the RMSNorm test to run for OOT devices ```bash # For demonstration purposes, I am testing on my fork and commit export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696 export VLLM_REPO_URL=https://github.com/romitjain/vllm VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream ``` This is expected to produce ```bash ================================================================================= short test summary info ================================================================================= PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available. FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close! ========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ========================================================== ``` We can see that our YAML is being respected and: 1. Most of the upstream tests are skipped due to not being in our YAML 2. 12 tests are getting skipped for parameters not being supported (`param skipped`) 3. 12 tests are selected for running, out of whcih 6 pass/6 fail ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [x] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: romit <romit@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
## Description This PR does 2 things: 1. **Adds tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** There are 2 tests added - a unit test verifying the correctness of the layer on CPU/Spyre and an integration test to ensure `forward_oot` gets called when it is installed as a vLLM plugin. While writing down these tests, I saw a couple of issues in the SpyreRMSNorm implementation - which I have attempted to fix, but please correct me if I am wrong. 2. **Adds a framework for running upstream vLLM tests and runs RMSNorm upstream tests** Building on: vllm-project/vllm#36246, this PR also adds a framework that can be used to filter and update upstream tests and run them from the `vllm` repo. 1. We clone vllm separately and run tests from the vllm repo (copied over from #800, not the contribution of this PR) 2. We manage the whitelist/filtering logic via a declarative YAML ## Related Issues <!-- Link related issues, e.g., `Fixes #` or `Relates to #456` --> torch-spyre/sendnn-inference#805 ## Test Plan <!-- Describe how you tested your changes. Include commands or steps to reproduce. --> To test both the features of this PR: 1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** ```bash cd vllm-spyre/vllm_spyre_next # Installs the pytest plugin uv pip install -e . VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre ``` This is expected to produce, ```bash ================================================================================= short test summary info ================================================================================= PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted> ========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ========================================================== ``` The tests are failing at boundaries of hidden dim, which is expected as of now, since hidden dim is not being padded to 64. (I can raise a separate PR to fix that, but I did not want to overload this PR) 2. **Upstream tests that run from vLLM** This makes use of my PR on vLLM: vllm-project/vllm#36246, which enables the RMSNorm test to run for OOT devices ```bash # For demonstration purposes, I am testing on my fork and commit export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696 export VLLM_REPO_URL=https://github.com/romitjain/vllm VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream ``` This is expected to produce ```bash ================================================================================= short test summary info ================================================================================= PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available. FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close! ========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ========================================================== ``` We can see that our YAML is being respected and: 1. Most of the upstream tests are skipped due to not being in our YAML 2. 12 tests are getting skipped for parameters not being supported (`param skipped`) 3. 12 tests are selected for running, out of whcih 6 pass/6 fail ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [x] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: romit <romit@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
## Description This PR does 2 things: 1. **Adds tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** There are 2 tests added - a unit test verifying the correctness of the layer on CPU/Spyre and an integration test to ensure `forward_oot` gets called when it is installed as a vLLM plugin. While writing down these tests, I saw a couple of issues in the SpyreRMSNorm implementation - which I have attempted to fix, but please correct me if I am wrong. 2. **Adds a framework for running upstream vLLM tests and runs RMSNorm upstream tests** Building on: vllm-project/vllm#36246, this PR also adds a framework that can be used to filter and update upstream tests and run them from the `vllm` repo. 1. We clone vllm separately and run tests from the vllm repo (copied over from #800, not the contribution of this PR) 2. We manage the whitelist/filtering logic via a declarative YAML ## Related Issues <!-- Link related issues, e.g., `Fixes #` or `Relates to #456` --> torch-spyre/sendnn-inference#805 ## Test Plan <!-- Describe how you tested your changes. Include commands or steps to reproduce. --> To test both the features of this PR: 1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** ```bash cd vllm-spyre/vllm_spyre_next # Installs the pytest plugin uv pip install -e . VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre ``` This is expected to produce, ```bash ================================================================================= short test summary info ================================================================================= PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted> ========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ========================================================== ``` The tests are failing at boundaries of hidden dim, which is expected as of now, since hidden dim is not being padded to 64. (I can raise a separate PR to fix that, but I did not want to overload this PR) 2. **Upstream tests that run from vLLM** This makes use of my PR on vLLM: vllm-project/vllm#36246, which enables the RMSNorm test to run for OOT devices ```bash # For demonstration purposes, I am testing on my fork and commit export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696 export VLLM_REPO_URL=https://github.com/romitjain/vllm VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream ``` This is expected to produce ```bash ================================================================================= short test summary info ================================================================================= PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available. FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close! ========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ========================================================== ``` We can see that our YAML is being respected and: 1. Most of the upstream tests are skipped due to not being in our YAML 2. 12 tests are getting skipped for parameters not being supported (`param skipped`) 3. 12 tests are selected for running, out of whcih 6 pass/6 fail ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [x] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: romit <romit@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
## Description This PR does 2 things: 1. **Adds tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** There are 2 tests added - a unit test verifying the correctness of the layer on CPU/Spyre and an integration test to ensure `forward_oot` gets called when it is installed as a vLLM plugin. While writing down these tests, I saw a couple of issues in the SpyreRMSNorm implementation - which I have attempted to fix, but please correct me if I am wrong. 2. **Adds a framework for running upstream vLLM tests and runs RMSNorm upstream tests** Building on: vllm-project/vllm#36246, this PR also adds a framework that can be used to filter and update upstream tests and run them from the `vllm` repo. 1. We clone vllm separately and run tests from the vllm repo (copied over from #800, not the contribution of this PR) 2. We manage the whitelist/filtering logic via a declarative YAML ## Related Issues <!-- Link related issues, e.g., `Fixes #` or `Relates to #456` --> torch-spyre/sendnn-inference#805 ## Test Plan <!-- Describe how you tested your changes. Include commands or steps to reproduce. --> To test both the features of this PR: 1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`** ```bash cd vllm-spyre/vllm_spyre_next # Installs the pytest plugin uv pip install -e . VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre ``` This is expected to produce, ```bash ================================================================================= short test summary info ================================================================================= PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1] PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False] PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted> FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted> ========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ========================================================== ``` The tests are failing at boundaries of hidden dim, which is expected as of now, since hidden dim is not being padded to 64. (I can raise a separate PR to fix that, but I did not want to overload this PR) 2. **Upstream tests that run from vLLM** This makes use of my PR on vLLM: vllm-project/vllm#36246, which enables the RMSNorm test to run for OOT devices ```bash # For demonstration purposes, I am testing on my fork and commit export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696 export VLLM_REPO_URL=https://github.com/romitjain/vllm VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream ``` This is expected to produce ```bash ================================================================================= short test summary info ================================================================================= PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1] PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16] SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests. SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available. SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available. FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close! FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close! ========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ========================================================== ``` We can see that our YAML is being respected and: 1. Most of the upstream tests are skipped due to not being in our YAML 2. 12 tests are getting skipped for parameters not being supported (`param skipped`) 3. 12 tests are selected for running, out of whcih 6 pass/6 fail ## Checklist - [x] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [x] My code follows the project's code style (run `bash format.sh`) - [x] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [x] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: romit <romit@ibm.com> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Purpose
The purpose of this PR is to update the RMSNorm test (
test_rms_norm) to make it more generic across devices. Specifically, I have updated the device parameterization for the test to be CPU as the default. This enables OOT hardware plugins to also run the same test. The PR usesforward_staticas the reference implementation instead offorward_native.forward_staticis astaticmethod; hence, that should be used as the gold standard response.Test Plan
This is an updated test, so no new tests are required
pytest tests/kernels/core/test_layernorm.py::test_rms_normon CPU installationpytest tests/kernels/core/test_layernorm.py::test_rms_normon CUDA installationTest Result
The same test runs fine for CUDA devices, as well as for CPU devices
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Edit: I have added a RFC for broader changes for similar ops for tests