Skip to content

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Mar 8, 2024

What does this PR do?

When running test cases under the models folder on XPU, I found that many model tests fail at the same test test_model_parallel_beam_search, e.g.

FAILED tests/models/bigbird_pegasus/test_modeling_bigbird_pegasus.py::BigBirdPegasusModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and xpu:0! (when che

This is because device_map="auto" is used. As elaborated in this PR, the device_map="auto" mechanism is still not mature on XPU, causing model to be loaded on CPU, rather than on XPU.

If there is no particular reason for using auto, I would suggest using torch_device instead auto, because torch_device is more specific than auto and we don't have the need to use auto, e.g. large model inference, in our tests anyway. WDYT? @ArthurZucker

@amyeroberts
Copy link
Contributor

Hi @faaany - thanks for opening this PR! Using device_map="auto" is necessary for this test - it's checking that beam search works when the model is split across devices. If it doesn't work with XPU, then you can add a skip with unittest e.g.

if "xpu" in torch_device:
    return unittest.skip("device_map='auto' does not work with XPU devices")

@faaany
Copy link
Contributor Author

faaany commented Mar 8, 2024

Hi @amyeroberts , thanks so much for reviewing this PR! I updated my patch and used torch_device=="xpu" instead of 'xpu' in torch_device, hope this is fine for you. And yes, I will remove this skip once auto works on XPU.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling!

Just a small nit

@faaany
Copy link
Contributor Author

faaany commented Mar 8, 2024

Thanks for handling!

Just a small nit

done, thx!

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@amyeroberts amyeroberts merged commit 1ea3ad1 into huggingface:main Mar 8, 2024
dvrogozh added a commit to dvrogozh/transformers that referenced this pull request Jan 17, 2025
`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped

Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped

Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.

Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (huggingface#29531)")
Signed-off-by: Dmitry Rogozhkin <[email protected]>
ydshieh pushed a commit that referenced this pull request Jan 17, 2025
`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped

Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped

Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.

Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (#29531)")

Signed-off-by: Dmitry Rogozhkin <[email protected]>
bursteratom pushed a commit to bursteratom/transformers that referenced this pull request Jan 31, 2025
…ngface#35742)

`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped

Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped

Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.

Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (huggingface#29531)")

Signed-off-by: Dmitry Rogozhkin <[email protected]>
@faaany faaany deleted the auto branch February 7, 2025 02:34
elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025
…ngface#35742)

`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped

Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped

Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.

Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (huggingface#29531)")

Signed-off-by: Dmitry Rogozhkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants