Fix Mllama model placement by pbielak · Pull Request #2125 · huggingface/optimum-habana

pbielak · 2025-07-09T06:29:52Z

What does this PR do?

With Llama 4 support in Transformers 4.51, there was a change in the Pipeline class [1], which causes the pipeline device to be set to self.model.device. In the case of Mllama, DeepSpeed is used to create the .language_model on HPU, whereas the rest of the model stays on CPU [2]. Hence, always self.model.device = CPU, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360

regisss · 2025-07-09T08:58:59Z

Why not targeting the main branch?

astachowiczhabana · 2025-07-09T11:05:49Z

Hi @regisss
We're targeting v1.19-release actually to allow all tests to pass

With Llama 4 support in Transformers 4.51, there was a change in the `Pipeline` class [1], which causes the pipeline device to be set to `self.model.device`. In the case of Mllama, DeepSpeed is used to create the `.language_model` on HPU, whereas the rest of the model stays on CPU [2]. Hence, always `self.model.device = CPU`, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU. [1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61 [2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360

karol-brejna-i

LGTM

karol-brejna-i · 2025-07-14T10:33:11Z

This PR looks like a solution for blockers of release 1.19, as well as 1.18.1. @regisss If merged, please cherry pick it to v1.18-release branch.

With Llama 4 support in Transformers 4.51, there was a change in the `Pipeline` class [1], which causes the pipeline device to be set to `self.model.device`. In the case of Mllama, DeepSpeed is used to create the `.language_model` on HPU, whereas the rest of the model stays on CPU [2]. Hence, always `self.model.device = CPU`, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU. [1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61 [2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360

astachowiczhabana marked this pull request as ready for review July 9, 2025 08:34

astachowiczhabana requested a review from regisss as a code owner July 9, 2025 08:34

pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from a840dd0 to 0e255c3 Compare July 9, 2025 11:06

pbielak requested review from libinta, mandy-li and vivekgoe as code owners July 9, 2025 11:06

pbielak changed the base branch from v1.18-release to v1.19-release July 9, 2025 11:06

astachowiczhabana added the synapse1.22 label Jul 9, 2025

astachowiczhabana self-assigned this Jul 9, 2025

pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from 0e255c3 to 0f03cc1 Compare July 10, 2025 07:34

astachowiczhabana force-pushed the v1.19-release branch from 79dc01d to 62b45d7 Compare July 11, 2025 07:50

pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from 0f03cc1 to 86537e5 Compare July 11, 2025 09:20

karol-brejna-i approved these changes Jul 14, 2025

View reviewed changes

astachowiczhabana approved these changes Jul 14, 2025

View reviewed changes

astachowiczhabana merged commit 7182e21 into huggingface:v1.19-release Jul 14, 2025
1 check passed

pbielak deleted the dev/pbielak/fix-mllama-model-placement branch July 16, 2025 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mllama model placement#2125

Fix Mllama model placement#2125
astachowiczhabana merged 1 commit into
huggingface:v1.19-releasefrom
HabanaAI:dev/pbielak/fix-mllama-model-placement

pbielak commented Jul 9, 2025

Uh oh!

regisss commented Jul 9, 2025

Uh oh!

astachowiczhabana commented Jul 9, 2025

Uh oh!

karol-brejna-i left a comment

Uh oh!

karol-brejna-i commented Jul 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pbielak commented Jul 9, 2025

What does this PR do?

Uh oh!

regisss commented Jul 9, 2025

Uh oh!

astachowiczhabana commented Jul 9, 2025

Uh oh!

karol-brejna-i left a comment

Choose a reason for hiding this comment

Uh oh!

karol-brejna-i commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karol-brejna-i commented Jul 14, 2025 •

edited

Loading