[Spyre-Next] E2E test with optional offloading to spyre layers#900
[Spyre-Next] E2E test with optional offloading to spyre layers#900bohnstingl merged 12 commits intotorch-spyre:mainfrom
Conversation
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. We also recommend installing prek and configuring it to check your code before every local commit. |
Signed-off-by: romit <romit@ibm.com>
bohnstingl
left a comment
There was a problem hiding this comment.
@romitjain thanks for the PR. I figured that the enforce_eager flag has been added in #853 as well. I am in principle fine with adding it either here or in the other PR.
I will try out the compilation_configs today and circle back
| type=str, | ||
| nargs="*", | ||
| default=["none"], | ||
| help="Custom ops to enable (e.g., --custom_ops none +RMSNorm +SiluAndMul)", |
There was a problem hiding this comment.
The enforce_eager flag is also being added in the identical way actually in #853, see https://github.com/jvlunteren/vllm-spyre/blob/1749821c1f0f345bc7047e79f8eb351eb9d86f46/vllm_spyre_next/examples/torch_spyre_inference.py#L31.
Maybe we can focus this PR on the evaluation of the custom_ops?
There was a problem hiding this comment.
Actually no, let's introduce the enforce_eager feature in this PR, like it is now.
bohnstingl
left a comment
There was a problem hiding this comment.
Overall looks good to me. I left some minor change requests
| "--custom-ops", | ||
| type=str, | ||
| nargs="*", | ||
| default=["none"], |
There was a problem hiding this comment.
Could we set the default value to [], so default=[]? This way, if the user doesn't set this variable, all CustomOps are enabled by default and the user doesn't have to explicitly add it via +RMSNorm, ... WDYT?
There was a problem hiding this comment.
Yes, makes sense.
Also, just to add - if default=[], enforce_eager=False will actually disable all the ops. So in that case, we still have to add --custom_ops all
See this
There was a problem hiding this comment.
Maybe we can account for that by setting the default to None actually. Then we do
if args.custom_ops is None:
if args.enforce_eager:
args.custom_ops = ["all"]
else:
args.custom_ops = []
This way we ensure in both modes that we are enabling all CustomOps if the user does not explicitly use the --custom-ops parameters?
| - "all": Run all supported ops on Spyre (default) | ||
| - "none": Run entirely on CPU | ||
| - "+LayerName": Selectively enable specific layers on Spyre | ||
| (e.g., --custom_ops none +RMSNorm +SiluAndMul) |
There was a problem hiding this comment.
The none wouldn't be necessary, I think?
| logger.warning_once( | ||
| "SpyreRMSNorm dispatch: enabled=%s, _forward_method=%s, forward_spyre compiled=%s", | ||
| self.enabled(), | ||
| self._forward_method.__name__, | ||
| self.maybe_compiled_forward_spyre is not self.forward_spyre, | ||
| ) |
There was a problem hiding this comment.
Do we want to keep this, or was this more for debugging purposes?
There was a problem hiding this comment.
I initially added this mostly for debugging purposes, but I remember reading it on Slack somewhere (I lost the thread) that we needed a way to figure out if the layers are actually being run or not. And while testing all the permutations and combinations of flags, I found this really helpful.
I am okay with removing too, or perhaps we can make this a debug statement?
WDYT?
There was a problem hiding this comment.
Making this a debug statement makes sense to me. I will then add this also to #880 for the other wrappers as well.
|
bot:test-next |
Signed-off-by: romit <romit@ibm.com>
bohnstingl
left a comment
There was a problem hiding this comment.
The PR looks good to me, just the one minor change and afterwards we can merge
| "SpyreRMSNorm: no dtype promotion is performed, " | ||
| "expect numerical differences to upstream vLLM." | ||
| ) | ||
| logger.debug( |
There was a problem hiding this comment.
Can we make this debug_once?
Signed-off-by: romit <romit@ibm.com>
Description
I have updated
examples/torch_spyre_inference.pyto support the following argumentsenforce_eager: Run in either compile mode or eager modecustom_ops: Support for dispatching our custom ops for forward pass. This can be used to offload individual layers to Spyre and test e2e inference with them. Example: Even if we have individual layer tests passing forSpyreRMSNorm, the e2e inference might diverge due to small numerical differences piling from multiple layers.With this script, we should be able to test the e2e inference of any custom layer that we implement in both eager mode and compile mode.
A few relevant resources:
custom_opsare enabled for differentenforce_eagermodes: https://docs.vllm.ai/en/stable/api/vllm/config/compilation/#vllm.config.compilation.CompilationConfig.custom_opsRelated Issues
Fixes:
Test Plan
I ran the following script with different custom ops. More in the internal slack thread here.
Checklist
bash format.sh)Signed-off-by:line (DCO compliance)