Skip to content

⚗️ Support model revision in tests#456

Merged
joerunde merged 10 commits intomainfrom
cache-revision
Sep 11, 2025
Merged

⚗️ Support model revision in tests#456
joerunde merged 10 commits intomainfrom
cache-revision

Conversation

@joerunde
Copy link
Copy Markdown
Collaborator

Description

Supports revisions for models in unit tests, and adds default revisions for existing models

Signed-off-by: Joe Runde <joe@joerun.de>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
Copy link
Copy Markdown
Collaborator

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lpgtm once tests pass 👍 (and inference script stuff is figured out)

Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <joe@joerun.de>
@joerunde
Copy link
Copy Markdown
Collaborator Author

oooof, handling all the new failures for the latest compiler release is getting hairy 😬

@joerunde joerunde changed the title ⚗️ try to support model revision in tests ⚗️ Support model revision in tests Sep 11, 2025
@joerunde
Copy link
Copy Markdown
Collaborator Author

All continuous batching tests are passing now on the pinned model revisions on spyre hardware. The graph comparison tests have been updated to pull the specific revision for inference.py to use, and are now passing 🎉!

Gonna go ahead and merge and get these changes in

@joerunde joerunde merged commit 0a098f4 into main Sep 11, 2025
16 checks passed
@joerunde joerunde deleted the cache-revision branch September 11, 2025 23:44
Copy link
Copy Markdown
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

joerunde added a commit that referenced this pull request Mar 11, 2026
<!-- markdownlint-disable -->

## Description

Updates model configuration for Granite 4 dense models including for
`granite` variant (instead of `granitemoehybrid`).

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

## Checklist

- [x] I have read the [contributing
guidelines](https://blog.vllm.ai/vllm-spyre/contributing/)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Joe Runde <joe@joerun.de>
rafvasq pushed a commit to rafvasq/sendnn-inference that referenced this pull request Mar 11, 2026
* update description

Signed-off-by: waleedqk <waleedqk@ibm.com>

* fix typo

Signed-off-by: waleedqk <waleedqk@ibm.com>

* new lines

Signed-off-by: waleedqk <waleedqk@ibm.com>

* new lines

Signed-off-by: waleedqk <waleedqk@ibm.com>

---------

Signed-off-by: waleedqk <waleedqk@ibm.com>
Co-authored-by: Dhruval Patel <Dhruval.Patel@ibm.com>
joerunde pushed a commit that referenced this pull request Mar 13, 2026
## Description

Bump minimum vllm to v0.17.1 now that it is released.

I recently noticed a warning log about `VLLM_USE_V1` being an invalid
env var, so cleaned up that too (it was removed upstream a while ago).

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

All current tests should pass

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
coderfornow pushed a commit to coderfornow/vllm-spyre that referenced this pull request Mar 16, 2026
## Description

Bump minimum vllm to v0.17.1 now that it is released.

I recently noticed a warning log about `VLLM_USE_V1` being an invalid
env var, so cleaned up that too (it was removed upstream a while ago).

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to torch-spyre#456` -->

## Test Plan

All current tests should pass

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: coderfornow <ritikdhiranan@icloud.com>
joerunde added a commit that referenced this pull request Mar 19, 2026
## Description

This PR does 2 things:

1. **Adds tests for SpyreRMSNorm that run from
`vllm-spyre/vllm_spyre_next`**

There are 2 tests added - a unit test verifying the correctness of the
layer on CPU/Spyre and an integration test to ensure `forward_oot` gets
called when it is installed as a vLLM plugin.
While writing down these tests, I saw a couple of issues in the
SpyreRMSNorm implementation - which I have attempted to fix, but please
correct me if I am wrong.

2. **Adds a framework for running upstream vLLM tests and runs RMSNorm
upstream tests**

Building on: vllm-project/vllm#36246, this PR
also adds a framework that can be used to filter and update upstream
tests and run them from the `vllm` repo.

1. We clone vllm separately and run tests from the vllm repo (copied
over from #800, not the contribution of this PR)
2. We manage the whitelist/filtering logic via a declarative YAML

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

#805

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

To test both the features of this PR:

1. **Tests for SpyreRMSNorm that run from `vllm-spyre/vllm_spyre_next`**

```bash
cd vllm-spyre/vllm_spyre_next
# Installs the pytest plugin
uv pip install -e .

VLLM_PLUGINS=spyre_next_ops pytest -rA tests/test_rms_norm.py -m spyre
```
This is expected to produce,

```bash
================================================================================= short test summary info =================================================================================
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-512-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-64-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-128-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-256-1]
PASSED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-512-1]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[False]
PASSED tests/test_rms_norm.py::test_rmsnorm_oot_dispatch[True]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[False-129-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-63-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-65-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-127-1] - <redacted>
FAILED tests/test_rms_norm.py::test_spyre_rmsnorm_matches_reference[True-129-1] - <redacted>
========================================================== 8 failed, 10 passed, 1 skipped, 5836 deselected, 9 warnings in 24.42s ==========================================================
```

The tests are failing at boundaries of hidden dim, which is expected as
of now, since hidden dim is not being padded to 64. (I can raise a
separate PR to fix that, but I did not want to overload this PR)

2. **Upstream tests that run from vLLM**

This makes use of my PR on vLLM:
vllm-project/vllm#36246, which enables the
RMSNorm test to run for OOT devices

```bash
# For demonstration purposes, I am testing on my fork and commit
export VLLM_COMMIT=a3b591a09545403114885ac7fbd94b63fbac1696
export VLLM_REPO_URL=https://github.com/romitjain/vllm

VLLM_PLUGINS=spyre_next_ops,spyre_next_test python -m pytest -rA -m upstream
```

This is expected to produce

```bash
================================================================================= short test summary info =================================================================================
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-False-64-16]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-1]
PASSED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-False-64-16]
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_permute_cols.py:11: permute_cols is not supported on ROCm
SKIPPED [126] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:34: not in upstream_tests.yaml
SKIPPED [54] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_activation.py:119: not in upstream_tests.yaml
SKIPPED [4] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_apply_rotary_emb.py:188: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_qk_norm_rope.py:48: fused_qk_norm_rope custom op requires cuda and rocm platform
SKIPPED [4256] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_quant_layernorm.py:156: not in upstream_tests.yaml
SKIPPED [16] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:21: not in upstream_tests.yaml
SKIPPED [8] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_fused_rms_norm_gated.py:61: not in upstream_tests.yaml
SKIPPED [12] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:24: param skipped
SKIPPED [648] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_layernorm.py:85: blocked by upstream_tests.yaml
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:59: Skipping CUDA/ROCm only tests.
SKIPPED [24] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_mrope.py:129: Skipping CUDA/ROCm only tests.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_opcheck.py: not in upstream_tests.yaml
SKIPPED [384] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py:54: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_pos_encoding.py: not in upstream_tests.yaml
SKIPPED [96] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding.py:30: not in upstream_tests.yaml
SKIPPED [144] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_rotary_embedding_mla_cache_fused.py:20: not in upstream_tests.yaml
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:14: UVA is not available.
SKIPPED [1] ../../.cache/vllm-upstream-tests/worktree-a3b591a09545/tests/kernels/core/test_uva.py:36: UVA is not available.
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype0-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype1-True-64-16] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-1] - AssertionError: Tensor-likes are not close!
FAILED test_layernorm.py::test_rms_norm[False-cpu-0-dtype2-True-64-16] - AssertionError: Tensor-likes are not close!
========================================================= 6 failed, 6 passed, 5825 skipped, 19 deselected, 13 warnings in 23.26s ==========================================================
```

We can see that our YAML is being respected and:
1. Most of the upstream tests are skipped due to not being in our YAML
2. 12 tests are getting skipped for parameters not being supported
(`param skipped`)
3. 12 tests are selected for running, out of whcih 6 pass/6 fail

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Joe Runde <joe@joerun.de>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
joerunde pushed a commit that referenced this pull request Mar 19, 2026
## Description

setuptools_scm v10 no longer generates a _version file by default. We
should be using importlib anyways though.

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

Manually verified both methods of obtaining the verion match:
```
from vllm_spyre._version import version as vllm_spyre_version
import importlib.metadata

vllm_spyre_version == importlib.metadata.version("vllm_spyre")
```

Also did a quick test with a manually generated precompiled model.

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
joerunde pushed a commit that referenced this pull request Mar 31, 2026
<!-- markdownlint-disable -->

## Description

This PR bumps the lower bound of foundation-model-stack dependency from
1.7.0 to 1.8.0 which includes Llama bug fixes for torch 2.10.

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

## Checklist

- [ ] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [ ] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Daniel Schenker <daniel.schenker@ibm.com>
dilipgb pushed a commit to dilipgb/vllm-spyre that referenced this pull request Mar 31, 2026
<!-- markdownlint-disable -->

## Description

This PR bumps the lower bound of foundation-model-stack dependency from
1.7.0 to 1.8.0 which includes Llama bug fixes for torch 2.10.

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to torch-spyre#456` -->

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

## Checklist

- [ ] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [ ] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Daniel Schenker <daniel.schenker@ibm.com>
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
joerunde pushed a commit that referenced this pull request Apr 1, 2026
<!-- markdownlint-disable -->

## Description

This PR updates the vllm upper bound in pyproject.toml to include
version 0.18.1.

Questions:
1. I kept it as a `<` operator and moved the upper bound to `0.18.2` to
abide by the comment that is there. If that was the wrong choice let me
know and I can switch it to `<=0.18.1`.
2. I ran a `uv sync --upgrade-package vllm` but nothing in uv.lock
changed... Is there something else I should run instead?

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

## Checklist

- [ ] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [ ] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Daniel Schenker <daniel.schenker@ibm.com>
dilipgb pushed a commit to dilipgb/vllm-spyre that referenced this pull request Apr 3, 2026
<!-- markdownlint-disable -->

This PR bumps the lower bound of foundation-model-stack dependency from
1.7.0 to 1.8.0 which includes Llama bug fixes for torch 2.10.

<!-- Link related issues, e.g., `Fixes #` or `Relates to torch-spyre#456` -->

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

- [ ] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [ ] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Daniel Schenker <daniel.schenker@ibm.com>
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
tjohnson31415 pushed a commit that referenced this pull request Apr 8, 2026
<!-- markdownlint-disable -->

## Description

Add support for vLLM v0.19.0

- bump vllm versions
- Inputs reorganization
([#35182](vllm-project/vllm#35182))
- `get_cross_encoder_act_fn` merged into `get_act_fn`
([#37537](vllm-project/vllm#37537))
- `RequestStatus.WAITING_FOR_FSM` renamed to
`WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR`
([#38048](vllm-project/vllm#38048))
- `prompt_token_ids_cpu` arg in PoolingMetadata
([#38139](vllm-project/vllm#38139))

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->


## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [x] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
bohnstingl pushed a commit that referenced this pull request Apr 10, 2026
<!-- markdownlint-disable -->

## Description

I have updated `examples/torch_spyre_inference.py` to support the
following arguments

1. `enforce_eager`: Run in either compile mode or eager mode
2. `custom_ops`: Support for dispatching our custom ops for forward
pass. This can be used to offload individual layers to Spyre and test
e2e inference with them. Example: Even if we have individual layer tests
passing for `SpyreRMSNorm`, the e2e inference might diverge due to small
numerical differences piling from multiple layers.

With this script, we should be able to test the e2e inference of any
custom layer that we implement in both eager mode and compile mode.

A few relevant resources:

1. How `custom_ops` are enabled for different `enforce_eager` modes:
https://docs.vllm.ai/en/stable/api/vllm/config/compilation/#vllm.config.compilation.CompilationConfig.custom_ops
2. How dispatch is decided for CustomOps:
https://docs.vllm.ai/en/latest/design/custom_op/#how-customop-works-in-vllm

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

- Solves: #764 
- This is also partially related to #805 and can be used to verify e2e
correctness of RMSNorm.
- I ran the following to check if the model was producing legible
outputs for custom implementation of RMSNorm:

```python
python examples/torch_spyre_inference.py -n 1 --custom_ops none +RMSNorm # Passes, compile mode for SpyreRMSNorm
python examples/torch_spyre_inference.py -n 1 --enforce_eager --custom_ops none +RMSNorm # Fails, see #794, eager mode for SpyreRMSNorm
```

Fixes: 

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->
I ran the following script with different custom ops. More in the
internal slack thread
[here](https://ibm-research.slack.com/archives/C07QCKFAA9J/p1775549134216499).

```python
python examples/torch_spyre_inference.py --custom_ops none # Pure vLLM CPU mode with compile, this is also the default mode with enforce_eager=False in vLLM
python examples/torch_spyre_inference.py --custom_ops none +RMSNorm # vLLM CPU mode in compile mode with compiled SpyreRMSNorm layer offloading to Spyre
python examples/torch_spyre_inference.py --custom_ops all # vLLM CPU mode in compile mode with all custom ops implemented offloading to Spyre and run in compile mode

python examples/torch_spyre_inference.py --enforce_eager --custom_ops none # Pure vLLM CPU mode in eager mode
python examples/torch_spyre_inference.py --enforce_eager --custom_ops none +RMSNorm # vLLM CPU mode in eager mode with SpyreRMSNorm layer offloading to Spyre and running in eager mode
python examples/torch_spyre_inference.py --enforce_eager --custom_ops all # vLLM CPU mode in eager mode with all custom ops implemented offloading to Spyre, this is also the default mode with enforce_eager=True
```

## Checklist

- [x] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [x] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [x] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: romit <romit@ibm.com>
joerunde pushed a commit that referenced this pull request Apr 14, 2026
<!-- markdownlint-disable -->

## Description

Add metric to log mm embedding calculation time.

## Related Issues

<!-- Link related issues, e.g., `Fixes #` or `Relates to #456` -->

## Test Plan

<!-- Describe how you tested your changes. Include commands or steps to
reproduce. -->

## Checklist

- [ ] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [ ] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [ ] My commits include a `Signed-off-by:` line (DCO compliance)

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants