Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
ydshieh
left a comment
There was a problem hiding this comment.
Thanks.
Looks line non of those integration tests could be run on T4, all GPU OOM.
Let me try first if we have other workaround.
|
Would you be up to try git fetch https://github.com/yao-matrix/transformers.git mistral3-xpu-cpu-offload:mistral3-xpu-cpu-offload && git checkout mistral3-xpu-cpu-offload and run the integration tests? I am using CPU offload, so 3 tests can run on A10. I can't make
On T4, it hangs forever ... |
Hmm, I am able to avoid OOM for But before I move forward, it would be nice if you can check if this cpu_offload works with xpu 🙏 ? |
@ydshieh , sorry for late response, just back to office from a 5-day holiday. Yes, I can try you offload changes, but it seems I cannot get the |
|
Sorry, it should be
You don't need to update the expected values, just to see if xpu works well with cpu offloading. |
|
BTW, it seems this cpu offload will produce different outputs on sigle-device v.s. multi-device environment. I have to set Anyway, let's see if it could at least run on xpu without error then we can adjust the outputs. |
|
@ydshieh , I tested the 4 cases w/
So it's work on XPU. Yet, I found the test will be pretty slow after enabling cpu offload, and if we run 4 cases in one pytest command, the process is easy to hang(using If you can observe similar w/ me, I will think it's not suitable to use |
|
You mean BTW, how many CPU RAM available on your XPU machine? In my cases, I need to ask our infra to provide single T4 with 64G. |
Actually it's I am using the Ponte Vecchio 1150 which has |
In my env, I found the slowness mainly comes from model downloading. Once the model is downloaded, the test can pass pretty fast. |
so, it's my env issue. Thx @faaany for testing. |
|
OK, thank you both very much. Let's try cpu offloading so T4 and A10 and use smaller images can run (most) of them. I will push some commits back to this PR. |
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
|
Hi @yao-matrix (after deleting the local
This will run on both T4 16G and A10 24G without any GPU OOM and match the expect values. If you want to keep tests running directly on XPU without using CPU offload, you can tweak def setUp(self):
cleanup(torch_device, gc_collect=True)
self.model_checkpoint = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
self.model = Mistral3ForConditionalGeneration.from_pretrained(
self.model_checkpoint, torch_dtype=torch.bfloat16
)
accelerate.cpu_offload(self.model, execution_device=torch_device)with an |
|
@ydshieh , cool, OK from my side. Pls feel free to merge. |
|
I think XPU will still get 2 cases failed for ground-truth mismatch as you mentioned (but you said which are fine). Thank you for the patience. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* enabled mistral3 test cases on XPU Signed-off-by: Yao Matrix <matrix.yao@intel.com> * calibrate A100 expectation Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update * update * update * update * update * update --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

@ydshieh , pls help review, thx