[Spyre-Next] SpyreSiluAndMul Tests#863
Conversation
Signed-off-by: romit <romit@ibm.com>
|
bot:next-test |
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. We also recommend installing prek and configuring it to check your code before every local commit. |
|
@bohnstingl Can you please review this as well? |
Signed-off-by: romit <romit@ibm.com>
|
bot:next-test |
Signed-off-by: romit <romit@ibm.com>
|
bot:next-test |
| items.sort(key=sort_key) | ||
|
|
||
|
|
||
| def _convert_yaml_value(value): |
There was a problem hiding this comment.
This is added for future compatibility in case we want to override the torch data types
Signed-off-by: romit <romit@ibm.com>
|
bot:next-test |
| activation: [silu_and_mul] | ||
| override: | ||
| num_tokens: [7, 83, 1024] | ||
| d: [8, 64, 128, 1024] |
There was a problem hiding this comment.
This is failing for odd values of d. Upstream had no odd values.
The failure comes from torch opcheck (ref) and not the numerical correctness check.
As far as numerical correctness goes, we are good to go
cbe619d to
3c9870d
Compare
| @pytest.mark.siluandmul | ||
| @pytest.mark.parametrize("num_tokens", [1, 7, 63, 64, 65, 1024]) | ||
| @pytest.mark.parametrize("d", [2, 63, 64, 65, 1024, 13824]) | ||
| @pytest.mark.parametrize("dtype", [torch.bfloat16, torch.float32]) |
There was a problem hiding this comment.
Does torch-spyre support these dtypes?
There was a problem hiding this comment.
Removed dtype parameterization
Signed-off-by: romit <romit@ibm.com>
2384c23 to
2170c8c
Compare
|
bot:next-test |
|
@romitjain I created a draft PR which reworks the forward call chain of our wrapper and removes the override of |
|
bot:next-test |
bohnstingl
left a comment
There was a problem hiding this comment.
Would reusing the upstream vLLM tests simplify this PR? Is #872 sufficient, or is there anything else that would be needed to enable the upstream tests?
| # TODO: These tests can not be enabled since we override forward_native | ||
| # and upstream tests consider forward_native as gold standard | ||
| # - rel_path: tests/kernels/core/test_activation.py | ||
| # allow_list: | ||
| # - test: "test_act_and_mul" | ||
| # TODO: Currently the tests fail because upstream rtol and atol are 0 | ||
| # This needs to be fixed in torch-spyre | ||
| # mode: xfail | ||
| # tags: [siluandmul, upstream] | ||
| # params: | ||
| # allow: | ||
| # activation: [silu_and_mul] | ||
| # override: | ||
| # num_tokens: [1, 7, 63, 64, 65, 83, 1024] | ||
| # # Odd values fail op check. Not required for current list of models we plan to support | ||
| # d: [8, 64, 512, 13824] | ||
| # dtype: [torch.float16] | ||
| # device: [cpu] |
There was a problem hiding this comment.
@romitjain Do we maybe want to first land #872 and then directly use the tests from upstream?
There was a problem hiding this comment.
Yes, I will update this PR to be in draft.
There was a problem hiding this comment.
Would we still need this file if we reuse the upstream tests?
There was a problem hiding this comment.
We might still need to check for forward_oot dispatch, but other tests can go
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by #863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
|
I need to rebase this PR with the latest changes and test again. Ready for review, but I need to test it once |
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
|
Hi @bohnstingl @joerunde |
| tensor = tensor.to(dtype=dtype) | ||
| if device is not None: | ||
| tensor = tensor.to(device=device) | ||
| # In case the tensor is on spyre and a dtype change is requested |
There was a problem hiding this comment.
@bohnstingl I have lightly modified this function here. Earlier, if this function had received a tensor residing on spyre and the device was also passed as spyre, then this function would have failed earlier.
There was a problem hiding this comment.
This conversion now seems to be a little convoluted and the flow is not immediately obvious. I can't really think of a cleaner or more simplified way to do the device transfers. With your new function, the semantics changes a bit though. For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre.
What was the fail that you saw earlier?
There was a problem hiding this comment.
For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre
Isn't this expected? I relied on this comment:
# In case the tensor is on spyre, we first need to move it to cpu and then change the dtype.
In the earlier definition, if I requested just a dtype change on a Spyre Tensor it would have failed.
>>> from vllm_spyre_next.custom_ops.utils import convert
>>> import torch
>>> x = torch.randn((2,2), dtype=torch.float32).to(device="spyre")
>>> x
tensor([[ 1.0254, -1.2343],
[-0.9974, -0.4666]], device='spyre:0')
>>> convert(x, dtype=torch.float16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/dev/shm/vllm-spyre/vllm_spyre_next/vllm_spyre_next/custom_ops/utils.py", line 68, in convert
tensor = tensor.to(dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^
File "/dev/shm/.venv/lib64/python3.12/site-packages/torch_spyre/_monkey_patch.py", line 63, in spyre_to
return orig_to(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Spyre backend does not support type conversion yet during copy.Signed-off-by: romit <romit@ibm.com>
bohnstingl
left a comment
There was a problem hiding this comment.
In principle it looks good to me. One concern for me is how the upstream vLLM will evolve with the tests. We should keep an eye on that and maybe not spend too much effort/time on the tests here?
| assert layer._forward_method == layer.forward_oot | ||
|
|
||
| # Mock forward_oot (called by forward_oot) with a known transform | ||
| monkeypatch.setattr(layer, "forward_oot", mock_forward_oot) |
There was a problem hiding this comment.
I think in the rms_test, I mocked one level lower, i.e., the _forward_spyre_impl, see https://github.com/vllm-project/vllm-spyre/blob/9907dfd7ce7176e475f423e85c0f15d9a7e57ae4/vllm_spyre_next/tests/test_rms_norm.py#L105. WDYT?
There was a problem hiding this comment.
Yes makes sense. I will update it
| tensor = tensor.to(dtype=dtype) | ||
| if device is not None: | ||
| tensor = tensor.to(device=device) | ||
| # In case the tensor is on spyre and a dtype change is requested |
There was a problem hiding this comment.
This conversion now seems to be a little convoluted and the flow is not immediately obvious. I can't really think of a cleaner or more simplified way to do the device transfers. With your new function, the semantics changes a bit though. For example, if the tensor is on spyre and just a dtype change is requested, it will first convert the tensor to cpu, then do the conversion and then transfer it back to spyre.
What was the fail that you saw earlier?
| # Currently the tests fail because upstream rtol and atol are 0 | ||
| # - rel_path: tests/kernels/core/test_activation.py | ||
| # allow_list: | ||
| # - test: "test_act_and_mul" | ||
| # mode: mandatory_pass | ||
| # tags: [siluandmul, upstream] | ||
| # params: | ||
| # allow: | ||
| # activation: [silu_and_mul] | ||
| # override: | ||
| # num_tokens: [1, 7, 63, 64, 65, 83, 1024] | ||
| # # Odd values fail op check. Not required for current list of models we plan to support | ||
| # d: [8, 64, 512, 13824] | ||
| # dtype: [torch.float32] | ||
| # device: [cpu] |
There was a problem hiding this comment.
Do we still plan to use this eventually? To me it would also make sense to look more closely into how the changed vLLM IR does the testing? For example, the PR that recently landed in upstream vLLM introduces some new tests: https://github.com/vllm-project/vllm/pull/33825/changes#diff-510a5c10fb13605e7270f8b0647ed83a724b2658f0352af189ea1155c831ad3fR1.
There was a problem hiding this comment.
Do we still plan to use this eventually?
Upstream tests are too strict (rtol==atol==0). Locally, I tested the upstream test_act_and_mul test by editing the test to rtol==atol==1e-2 and they were passing. I don't think we would be able to use this test without parameterizing (atol, rtol) in the upstream test. I will raise a PR for that and see if upstream agrees to merge it. If that lands, we would be able to re-use it.
To me it would also make sense to look more closely into how the changed vLLM IR does the testing?
Agreed! But I believe we would also some changes on our end to implement new vLLM IR compatible ops and to enabled those tests. I am following your PR #877 and will go through the attached PR in more detail and see how we can adapt them.
Yes, my initial hope was the same that we should directly adapt the upstream tests. But both the ops implemented so far in our plugin require higher (atol, rtol) to pass upstream tests. (rmsnorm: due to no fp32 promotion, siluandmul: discussed in this PR) |
Signed-off-by: romit <romit@ibm.com>
| # In case the tensor is on spyre and a dtype change is requested | ||
| # we first need to move it to cpu and then change the dtype. | ||
| if tensor.device.type == "spyre" and dtype is not None: | ||
| tensor = tensor.to(device="cpu") | ||
| # Set the device to spyre to avoid unexpected device change of the return tensor. | ||
| if device is None: | ||
| device = "spyre" | ||
|
|
||
| if dtype is not None: | ||
| tensor = tensor.to(dtype=dtype) | ||
| if device is not None: | ||
| tensor = tensor.to(device=device) |
There was a problem hiding this comment.
Like I mentioned the other time, I think the logic here could be slightly improved to avoid unintended copies. Maybe something like this?
if tensor is None:
return None
current_device = tensor.device.type
# Spyre doesn't support on-device dtype conversion
if current_device == "spyre" and dtype is not None and tensor.dtype != dtype:
tensor = tensor.to(device="cpu")
if device is None:
device = "spyre"
if dtype is not None and tensor.dtype != dtype:
tensor = tensor.to(dtype=dtype)
if device is not None and tensor.device.type != device:
tensor = tensor.to(device=device)
return tensor
| # Currently the tests fail because upstream rtol and atol are 0 | ||
| # - rel_path: tests/kernels/core/test_activation.py | ||
| # allow_list: | ||
| # - test: "test_act_and_mul" | ||
| # mode: mandatory_pass | ||
| # tags: [siluandmul, upstream] | ||
| # params: | ||
| # allow: | ||
| # activation: [silu_and_mul] | ||
| # override: | ||
| # num_tokens: [1, 7, 63, 64, 65, 83, 1024] | ||
| # # Odd values fail op check. Not required for current list of models we plan to support | ||
| # d: [8, 64, 512, 13824] | ||
| # dtype: [torch.float32] | ||
| # device: [cpu] |
There was a problem hiding this comment.
Maybe!
It looks like this isn't currently set up to be easily patchable: https://github.com/vllm-project/vllm/blob/e69a265135ef48312d78130f64b7bfce4cd81a37/tests/kernels/core/test_activation.py#L106
else:
# The SiluAndMul, MulAndSilu, GELU and FatReLU implementations are
# equivalent to the native PyTorch implementations, so we can do exact
# comparison.
torch.testing.assert_close(out, ref_out, atol=0.0, rtol=0.0)
We could set up something like this that edits assert_close to override cases where 0.0 is passed:
@pytest.fixture
def patch_tolerances(monkeypatch):
original_assert_close = torch.testing.assert_close
def patched_assert_close(*args, atol, rtol):
if atol == 0.0 and rtol == 0.0:
return original_assert_close(*args, atol=0.5, rtol=rtol)
return original_assert_close(*args, atol=atol, rtol=rtol)
monkeypatch.setattr(torch.testing, "assert_close", patched_assert_close)
yield
but we'd definitely still want to update upstream to make those parameterizable, or have them pull from a common method we can override
There was a problem hiding this comment.
@bohnstingl @joerunde
I have tried to solve this by adding a fixture to monkeypatch torch.testing.assert_close
With this, we are able to enable the upstream SiluAndMul test, but please note that it will only pass for fp32 right now, since fp16 implementation is broken in Spyre.
There was a problem hiding this comment.
FP16 failure is related to this internal Slack thread.
In compile mode, if we send FP32 data to the SpyreSiluAndMul layer, internally the layer does this
x1 = x[..., :d]
x2 = x[..., d:]
and then converts the tensors to FP16. The dtype conversion makes x1, x2 contiguous. And then we transfer the tensor to the Spyre device.
But if we send FP16 data to the SpyreSiluAndMul layer, conversion to FP16 is a no op. When we transfer the tensor to the Spyre device, x2 is exactly the same as x1.
This is an issue in torch-spyre, and as I understand from the comments, strided view copies are not yet reliable in torch-spyre.
Signed-off-by: romit <romit@ibm.com>
Signed-off-by: romit <romit@ibm.com>
…silu_mul Signed-off-by: romit <romit@ibm.com>
bohnstingl
left a comment
There was a problem hiding this comment.
LGTM!
@joerunde or @tjohnson31415 any comments on the testing part?
|
Hi @joerunde / @tjohnson31415 |
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by torch-spyre/sendnn-inference#863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by torch-spyre/sendnn-inference#863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by torch-spyre/sendnn-inference#863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by torch-spyre/sendnn-inference#863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
<!-- markdownlint-disable --> ## Description This PR reworks the call chain of the forward functions for the layer wrappings for torch-spyre. In particular, it avoids overriding `forward_native` in favor of using `forward_oot`. cc @romitjain ## Related Issues This PR was triggered by torch-spyre/sendnn-inference#863 ## Test Plan This change is non user-facing and thus all the existing tests should work. Moreover, it should enable the upstream vLLM tests to be supported as well. ## Checklist - [X] I have read the [contributing guidelines](https://docs.vllm.ai/projects/spyre/en/latest/contributing) - [X] My code follows the project's code style (run `bash format.sh`) - [ ] I have added tests for my changes (if applicable) - [ ] I have updated the documentation (if applicable) - [X] My commits include a `Signed-off-by:` line (DCO compliance) --------- Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: romit <romit@ibm.com>
Description
This PR adds 2 tests for
SpyreSiluAndMul: Numerical correctness and OOT plugin compatibility. It also adds upstream tests (commented out). However, upstream tests currently fail due to a lack oftorch.randnsupport on spyre hardware.torch.randnis used in upstream tests to create a tensor on the device.Related Issues
#810
Test Plan
Checklist
bash format.sh)Signed-off-by:line (DCO compliance)