Skip to content

[Spyre-Next] SpyreSiluAndMul Tests#863

Open
romitjain wants to merge 17 commits intotorch-spyre:mainfrom
romitjain:tests/silu_mul
Open

[Spyre-Next] SpyreSiluAndMul Tests#863
romitjain wants to merge 17 commits intotorch-spyre:mainfrom
romitjain:tests/silu_mul

Conversation

@romitjain
Copy link
Copy Markdown
Collaborator

Description

This PR adds 2 tests for SpyreSiluAndMul: Numerical correctness and OOT plugin compatibility. It also adds upstream tests (commented out). However, upstream tests currently fail due to a lack of torch.randn support on spyre hardware. torch.randn is used in upstream tests to create a tensor on the device.

Related Issues

#810

Test Plan

export VLLM_PLUGINS=spyre_next,spyre_next_ops
pytest -rA -m "siluandmul"

Checklist

  • I have read the contributing guidelines
  • My code follows the project's code style (run bash format.sh)
  • I have added tests for my changes (if applicable)
  • I have updated the documentation (if applicable)
  • My commits include a Signed-off-by: line (DCO compliance)

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

@romitjain
Copy link
Copy Markdown
Collaborator Author

@bohnstingl Can you please review this as well?

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

@rafvasq rafvasq requested a review from bohnstingl March 25, 2026 13:23
Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

items.sort(key=sort_key)


def _convert_yaml_value(value):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added for future compatibility in case we want to override the torch data types

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

activation: [silu_and_mul]
override:
num_tokens: [7, 83, 1024]
d: [8, 64, 128, 1024]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing for odd values of d. Upstream had no odd values.
The failure comes from torch opcheck (ref) and not the numerical correctness check.

As far as numerical correctness goes, we are good to go

@pytest.mark.siluandmul
@pytest.mark.parametrize("num_tokens", [1, 7, 63, 64, 65, 1024])
@pytest.mark.parametrize("d", [2, 63, 64, 65, 1024, 13824])
@pytest.mark.parametrize("dtype", [torch.bfloat16, torch.float32])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does torch-spyre support these dtypes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed dtype parameterization

Signed-off-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

@bohnstingl
Copy link
Copy Markdown
Collaborator

@romitjain I created a draft PR which reworks the forward call chain of our wrapper and removes the override of forward_native. Could you please take a look and see whether this would enable the upstream vllm tests?

@romitjain
Copy link
Copy Markdown
Collaborator Author

bot:next-test

Copy link
Copy Markdown
Collaborator

@bohnstingl bohnstingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would reusing the upstream vLLM tests simplify this PR? Is #872 sufficient, or is there anything else that would be needed to enable the upstream tests?

Comment on lines +36 to +53
# TODO: These tests can not be enabled since we override forward_native
# and upstream tests consider forward_native as gold standard
# - rel_path: tests/kernels/core/test_activation.py
# allow_list:
# - test: "test_act_and_mul"
# TODO: Currently the tests fail because upstream rtol and atol are 0
# This needs to be fixed in torch-spyre
# mode: xfail
# tags: [siluandmul, upstream]
# params:
# allow:
# activation: [silu_and_mul]
# override:
# num_tokens: [1, 7, 63, 64, 65, 83, 1024]
# # Odd values fail op check. Not required for current list of models we plan to support
# d: [8, 64, 512, 13824]
# dtype: [torch.float16]
# device: [cpu]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romitjain Do we maybe want to first land #872 and then directly use the tests from upstream?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update this PR to be in draft.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still need this file if we reuse the upstream tests?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might still need to check for forward_oot dispatch, but other tests can go

tjohnson31415 pushed a commit that referenced this pull request Mar 30, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain 

## Related Issues

This PR was triggered by
#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
@romitjain
Copy link
Copy Markdown
Collaborator Author

romitjain commented Mar 31, 2026

I need to rebase this PR with the latest changes and test again. Ready for review, but I need to test it once

@romitjain romitjain requested a review from bohnstingl April 1, 2026 13:45
@romitjain
Copy link
Copy Markdown
Collaborator Author

Hi @bohnstingl @joerunde
This PR is ready for review. I dont see spyre-ci check here, should I still run the bot:next command?

tensor = tensor.to(dtype=dtype)
if device is not None:
tensor = tensor.to(device=device)
# In case the tensor is on spyre and a dtype change is requested
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bohnstingl I have lightly modified this function here. Earlier, if this function had received a tensor residing on spyre and the device was also passed as spyre, then this function would have failed earlier.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion now seems to be a little convoluted and the flow is not immediately obvious. I can't really think of a cleaner or more simplified way to do the device transfers. With your new function, the semantics changes a bit though. For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre.
What was the fail that you saw earlier?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre

Isn't this expected? I relied on this comment:

# In case the tensor is on spyre, we first need to move it to cpu and then change the dtype.

In the earlier definition, if I requested just a dtype change on a Spyre Tensor it would have failed.

>>> from vllm_spyre_next.custom_ops.utils import convert
>>> import torch
>>> x = torch.randn((2,2), dtype=torch.float32).to(device="spyre")
>>> x
tensor([[ 1.0254, -1.2343],
        [-0.9974, -0.4666]], device='spyre:0')
>>> convert(x, dtype=torch.float16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dev/shm/vllm-spyre/vllm_spyre_next/vllm_spyre_next/custom_ops/utils.py", line 68, in convert
    tensor = tensor.to(dtype=dtype)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/dev/shm/.venv/lib64/python3.12/site-packages/torch_spyre/_monkey_patch.py", line 63, in spyre_to
    return orig_to(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Spyre backend does not support type conversion yet during copy.

Signed-off-by: romit <romit@ibm.com>
Copy link
Copy Markdown
Collaborator

@bohnstingl bohnstingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle it looks good to me. One concern for me is how the upstream vLLM will evolve with the tests. We should keep an eye on that and maybe not spend too much effort/time on the tests here?

assert layer._forward_method == layer.forward_oot

# Mock forward_oot (called by forward_oot) with a known transform
monkeypatch.setattr(layer, "forward_oot", mock_forward_oot)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense. I will update it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bohnstingl I have updated this

tensor = tensor.to(dtype=dtype)
if device is not None:
tensor = tensor.to(device=device)
# In case the tensor is on spyre and a dtype change is requested
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion now seems to be a little convoluted and the flow is not immediately obvious. I can't really think of a cleaner or more simplified way to do the device transfers. With your new function, the semantics changes a bit though. For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre.
What was the fail that you saw earlier?

Comment on lines +36 to +50
# Currently the tests fail because upstream rtol and atol are 0
# - rel_path: tests/kernels/core/test_activation.py
# allow_list:
# - test: "test_act_and_mul"
# mode: mandatory_pass
# tags: [siluandmul, upstream]
# params:
# allow:
# activation: [silu_and_mul]
# override:
# num_tokens: [1, 7, 63, 64, 65, 83, 1024]
# # Odd values fail op check. Not required for current list of models we plan to support
# d: [8, 64, 512, 13824]
# dtype: [torch.float32]
# device: [cpu]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still plan to use this eventually? To me it would also make sense to look more closely into how the changed vLLM IR does the testing? For example, the PR that recently landed in upstream vLLM introduces some new tests: https://github.com/vllm-project/vllm/pull/33825/changes#diff-510a5c10fb13605e7270f8b0647ed83a724b2658f0352af189ea1155c831ad3fR1.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still plan to use this eventually?

Upstream tests are too strict (rtol==atol==0). Locally, I tested the upstream test_act_and_mul test by editing the test to rtol==atol==1e-2 and they were passing. I don't think we would be able to use this test without parameterizing (atol, rtol) in the upstream test. I will raise a PR for that and see if upstream agrees to merge it. If that lands, we would be able to re-use it.

To me it would also make sense to look more closely into how the changed vLLM IR does the testing?

Agreed! But I believe we would also some changes on our end to implement new vLLM IR compatible ops and to enabled those tests. I am following your PR #877 and will go through the attached PR in more detail and see how we can adapt them.

@romitjain
Copy link
Copy Markdown
Collaborator Author

@bohnstingl

One concern for me is how the upstream vLLM will evolve with the tests. We should keep an eye on that and maybe not spend too much effort/time on the tests here?

Yes, my initial hope was the same that we should directly adapt the upstream tests. But both the ops implemented so far in our plugin require higher (atol, rtol) to pass upstream tests. (rmsnorm: due to no fp32 promotion, siluandmul: discussed in this PR)
I will be raising a PR upstream to see if we can parametrize (rtol,atol) values.

Signed-off-by: romit <romit@ibm.com>
@romitjain romitjain requested a review from bohnstingl April 6, 2026 04:22
Comment on lines +63 to +74
# In case the tensor is on spyre and a dtype change is requested
# we first need to move it to cpu and then change the dtype.
if tensor.device.type == "spyre" and dtype is not None:
tensor = tensor.to(device="cpu")
# Set the device to spyre to avoid unexpected device change of the return tensor.
if device is None:
device = "spyre"

if dtype is not None:
tensor = tensor.to(dtype=dtype)
if device is not None:
tensor = tensor.to(device=device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned the other time, I think the logic here could be slightly improved to avoid unintended copies. Maybe something like this?

if tensor is None:
    return None

current_device = tensor.device.type

# Spyre doesn't support on-device dtype conversion
if current_device == "spyre" and dtype is not None and tensor.dtype != dtype:
    tensor = tensor.to(device="cpu")
    if device is None:
        device = "spyre"
                                                                                                                            
if dtype is not None and tensor.dtype != dtype:
    tensor = tensor.to(dtype=dtype)                                                                                         
if device is not None and tensor.device.type != device:                                                                     
    tensor = tensor.to(device=device)
                                                                                                                            
return tensor

Comment on lines +36 to +50
# Currently the tests fail because upstream rtol and atol are 0
# - rel_path: tests/kernels/core/test_activation.py
# allow_list:
# - test: "test_act_and_mul"
# mode: mandatory_pass
# tags: [siluandmul, upstream]
# params:
# allow:
# activation: [silu_and_mul]
# override:
# num_tokens: [1, 7, 63, 64, 65, 83, 1024]
# # Odd values fail op check. Not required for current list of models we plan to support
# d: [8, 64, 512, 13824]
# dtype: [torch.float32]
# device: [cpu]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be addressed with #884 or can we extend #884 to help with that?
cc @joerunde

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe!

It looks like this isn't currently set up to be easily patchable: https://github.com/vllm-project/vllm/blob/e69a265135ef48312d78130f64b7bfce4cd81a37/tests/kernels/core/test_activation.py#L106

    else:
        # The SiluAndMul, MulAndSilu, GELU and FatReLU implementations are
        # equivalent to the native PyTorch implementations, so we can do exact
        # comparison.
        torch.testing.assert_close(out, ref_out, atol=0.0, rtol=0.0)

We could set up something like this that edits assert_close to override cases where 0.0 is passed:

@pytest.fixture
def patch_tolerances(monkeypatch):

    original_assert_close = torch.testing.assert_close

    def patched_assert_close(*args, atol, rtol):
        if atol == 0.0 and rtol == 0.0:
            return original_assert_close(*args, atol=0.5, rtol=rtol)
        return original_assert_close(*args, atol=atol, rtol=rtol)

    monkeypatch.setattr(torch.testing, "assert_close", patched_assert_close)

    yield

but we'd definitely still want to update upstream to make those parameterizable, or have them pull from a common method we can override

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bohnstingl @joerunde
I have tried to solve this by adding a fixture to monkeypatch torch.testing.assert_close

Please see: https://github.com/vllm-project/vllm-spyre/pull/863/changes#diff-035ed2d7ff0ff23d9dfc96c0f483503dbb4c16529e7ea88192e717621f4da830R637

With this, we are able to enable the upstream SiluAndMul test, but please note that it will only pass for fp32 right now, since fp16 implementation is broken in Spyre.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP16 failure is related to this internal Slack thread.

In compile mode, if we send FP32 data to the SpyreSiluAndMul layer, internally the layer does this

x1 = x[..., :d]
x2 = x[..., d:]

and then converts the tensors to FP16. The dtype conversion makes x1, x2 contiguous. And then we transfer the tensor to the Spyre device.

But if we send FP16 data to the SpyreSiluAndMul layer, conversion to FP16 is a no op. When we transfer the tensor to the Spyre device, x2 is exactly the same as x1.

This is an issue in torch-spyre, and as I understand from the comments, strided view copies are not yet reliable in torch-spyre.

@romitjain romitjain requested a review from bohnstingl April 9, 2026 12:57
Copy link
Copy Markdown
Collaborator

@bohnstingl bohnstingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joerunde or @tjohnson31415 any comments on the testing part?

@romitjain
Copy link
Copy Markdown
Collaborator Author

Hi @joerunde / @tjohnson31415
Can you please review and merge this? Thank you

bohnstingl added a commit to bohnstingl/vllm-spyre that referenced this pull request Apr 16, 2026
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain 

## Related Issues

This PR was triggered by
torch-spyre/sendnn-inference#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain 

## Related Issues

This PR was triggered by
torch-spyre/sendnn-inference#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
sducouedic pushed a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain 

## Related Issues

This PR was triggered by
torch-spyre/sendnn-inference#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
sducouedic added a commit to sducouedic/spyre-inference that referenced this pull request Apr 17, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain

## Related Issues

This PR was triggered by
torch-spyre/sendnn-inference#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
tdoublep pushed a commit to torch-spyre/spyre-inference that referenced this pull request Apr 20, 2026
<!-- markdownlint-disable -->

## Description

This PR reworks the call chain of the forward functions for the layer
wrappings for torch-spyre. In particular, it avoids overriding
`forward_native` in favor of using `forward_oot`.

cc @romitjain 

## Related Issues

This PR was triggered by
torch-spyre/sendnn-inference#863

## Test Plan

This change is non user-facing and thus all the existing tests should
work. Moreover, it should enable the upstream vLLM tests to be supported
as well.

## Checklist

- [X] I have read the [contributing
guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing)
- [X] My code follows the project's code style (run `bash format.sh`)
- [ ] I have added tests for my changes (if applicable)
- [ ] I have updated the documentation (if applicable)
- [X] My commits include a `Signed-off-by:` line (DCO compliance)

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: romit <romit@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants