[Spyre-Next] E2E test with optional offloading to spyre layers by romitjain · Pull Request #900 · torch-spyre/sendnn-inference

romitjain · 2026-04-07T07:55:39Z

Description

I have updated examples/torch_spyre_inference.py to support the following arguments

enforce_eager: Run in either compile mode or eager mode
custom_ops: Support for dispatching our custom ops for forward pass. This can be used to offload individual layers to Spyre and test e2e inference with them. Example: Even if we have individual layer tests passing for SpyreRMSNorm, the e2e inference might diverge due to small numerical differences piling from multiple layers.

With this script, we should be able to test the e2e inference of any custom layer that we implement in both eager mode and compile mode.

A few relevant resources:

How custom_ops are enabled for different enforce_eager modes: https://docs.vllm.ai/en/stable/api/vllm/config/compilation/#vllm.config.compilation.CompilationConfig.custom_ops
How dispatch is decided for CustomOps: https://docs.vllm.ai/en/latest/design/custom_op/#how-customop-works-in-vllm

Related Issues

Solves: [Feature]: Configure individual layers on spyre #764
This is also partially related to Verify numerical equivalence of RMSNorm layer #805 and can be used to verify e2e correctness of RMSNorm.
I ran the following to check if the model was producing legible outputs for custom implementation of RMSNorm:

python examples/torch_spyre_inference.py -n 1 --custom_ops none +RMSNorm # Passes, compile mode for SpyreRMSNorm
python examples/torch_spyre_inference.py -n 1 --enforce_eager --custom_ops none +RMSNorm # Fails, see #794, eager mode for SpyreRMSNorm

Fixes:

Test Plan

I ran the following script with different custom ops. More in the internal slack thread here.

python examples/torch_spyre_inference.py --custom_ops none # Pure vLLM CPU mode with compile, this is also the default mode with enforce_eager=False in vLLM
python examples/torch_spyre_inference.py --custom_ops none +RMSNorm # vLLM CPU mode in compile mode with compiled SpyreRMSNorm layer offloading to Spyre
python examples/torch_spyre_inference.py --custom_ops all # vLLM CPU mode in compile mode with all custom ops implemented offloading to Spyre and run in compile mode

python examples/torch_spyre_inference.py --enforce_eager --custom_ops none # Pure vLLM CPU mode in eager mode
python examples/torch_spyre_inference.py --enforce_eager --custom_ops none +RMSNorm # vLLM CPU mode in eager mode with SpyreRMSNorm layer offloading to Spyre and running in eager mode
python examples/torch_spyre_inference.py --enforce_eager --custom_ops all # vLLM CPU mode in eager mode with all custom ops implemented offloading to Spyre, this is also the default mode with enforce_eager=True

Checklist

I have read the contributing guidelines
My code follows the project's code style (run bash format.sh)
I have added tests for my changes (if applicable)
I have updated the documentation (if applicable)
My commits include a Signed-off-by: line (DCO compliance)

Signed-off-by: romit <romit@ibm.com>

…e2e-layer-wise

Signed-off-by: romit <romit@ibm.com>

…e1e-layer-wise

Signed-off-by: romit <romit@ibm.com>

github-actions · 2026-04-07T07:55:50Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

Signed-off-by: romit <romit@ibm.com>

bohnstingl

@romitjain thanks for the PR. I figured that the enforce_eager flag has been added in #853 as well. I am in principle fine with adding it either here or in the other PR.

I will try out the compilation_configs today and circle back

bohnstingl · 2026-04-07T08:20:38Z

+        type=str,
+        nargs="*",
+        default=["none"],
+        help="Custom ops to enable (e.g., --custom_ops none +RMSNorm +SiluAndMul)",


The enforce_eager flag is also being added in the identical way actually in #853, see https://github.com/jvlunteren/vllm-spyre/blob/1749821c1f0f345bc7047e79f8eb351eb9d86f46/vllm_spyre_next/examples/torch_spyre_inference.py#L31.

Maybe we can focus this PR on the evaluation of the custom_ops?

Actually no, let's introduce the enforce_eager feature in this PR, like it is now.

bohnstingl

Overall looks good to me. I left some minor change requests

bohnstingl · 2026-04-07T20:38:14Z

+        "--custom-ops",
+        type=str,
+        nargs="*",
+        default=["none"],


Could we set the default value to [], so default=[]? This way, if the user doesn't set this variable, all CustomOps are enabled by default and the user doesn't have to explicitly add it via +RMSNorm, ... WDYT?

Yes, makes sense.

Also, just to add - if default=[], enforce_eager=False will actually disable all the ops. So in that case, we still have to add --custom_ops all

See this

Maybe we can account for that by setting the default to None actually. Then we do

if args.custom_ops is None: if args.enforce_eager: args.custom_ops = ["all"] else: args.custom_ops = []

This way we ensure in both modes that we are enabling all CustomOps if the user does not explicitly use the --custom-ops parameters?

bohnstingl · 2026-04-07T20:38:31Z

+  - "all": Run all supported ops on Spyre (default)
+  - "none": Run entirely on CPU
+  - "+LayerName": Selectively enable specific layers on Spyre
+    (e.g., --custom_ops none +RMSNorm +SiluAndMul)


The none wouldn't be necessary, I think?

bohnstingl · 2026-04-07T20:39:05Z

+        logger.warning_once(
+            "SpyreRMSNorm dispatch: enabled=%s, _forward_method=%s, forward_spyre compiled=%s",
+            self.enabled(),
+            self._forward_method.__name__,
+            self.maybe_compiled_forward_spyre is not self.forward_spyre,
+        )


Do we want to keep this, or was this more for debugging purposes?

I initially added this mostly for debugging purposes, but I remember reading it on Slack somewhere (I lost the thread) that we needed a way to figure out if the layers are actually being run or not. And while testing all the permutations and combinations of flags, I found this really helpful.

I am okay with removing too, or perhaps we can make this a debug statement?

WDYT?

Making this a debug statement makes sense to me. I will then add this also to #880 for the other wrappers as well.

bohnstingl · 2026-04-07T20:42:08Z

bot:test-next

Signed-off-by: romit <romit@ibm.com>

bohnstingl

The PR looks good to me, just the one minor change and afterwards we can merge

bohnstingl · 2026-04-08T19:14:17Z

            "SpyreRMSNorm: no dtype promotion is performed, "
            "expect numerical differences to upstream vLLM."
        )
+        logger.debug(


Can we make this debug_once?

This is done

Signed-off-by: romit <romit@ibm.com>

romitjain added 9 commits April 1, 2026 05:33

test scripts

7eeac18

Signed-off-by: romit <romit@ibm.com>

Merge branch 'main' of github.com:vllm-project/vllm-spyre into tests/…

553e2a4

…e2e-layer-wise

Merge branch 'main' of github.com:vllm-project/vllm-spyre into tests/…

26112d6

…e2e-layer-wise

Updates

f2617cf

Signed-off-by: romit <romit@ibm.com>

Merge branch 'main' of github.com:vllm-project/vllm-spyre into tests/…

30734ca

…e1e-layer-wise

Updates

79488cc

Signed-off-by: romit <romit@ibm.com>

Updated troch spyre inference

9d75fe6

Signed-off-by: romit <romit@ibm.com>

Added fp16

35ff5dc

Signed-off-by: romit <romit@ibm.com>

Removed test file

8d7813c

Signed-off-by: romit <romit@ibm.com>

Formating fix

f067051

Signed-off-by: romit <romit@ibm.com>

romitjain marked this pull request as ready for review April 7, 2026 08:06

romitjain requested review from joerunde and prashantgupta24 as code owners April 7, 2026 08:06

bohnstingl self-requested a review April 7, 2026 08:08

bohnstingl requested changes Apr 7, 2026

View reviewed changes

Addressed PR comments

e11b6ed

Signed-off-by: romit <romit@ibm.com>

bohnstingl approved these changes Apr 8, 2026

View reviewed changes

Added debug once

8f46a59

Signed-off-by: romit <romit@ibm.com>

bohnstingl merged commit fc6b54b into torch-spyre:main Apr 10, 2026
13 checks passed

romitjain deleted the tests/e2e-layer-wise branch April 10, 2026 05:14

Conversation

romitjain commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Test Plan

Checklist

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bohnstingl commented Apr 7, 2026

Uh oh!

bohnstingl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

romitjain commented Apr 7, 2026 •

edited

Loading