Skip to content

Fix Mllama model placement#2125

Merged
astachowiczhabana merged 1 commit into
huggingface:v1.19-releasefrom
HabanaAI:dev/pbielak/fix-mllama-model-placement
Jul 14, 2025
Merged

Fix Mllama model placement#2125
astachowiczhabana merged 1 commit into
huggingface:v1.19-releasefrom
HabanaAI:dev/pbielak/fix-mllama-model-placement

Conversation

@pbielak
Copy link
Copy Markdown
Collaborator

@pbielak pbielak commented Jul 9, 2025

What does this PR do?

With Llama 4 support in Transformers 4.51, there was a change in the Pipeline class [1], which causes the pipeline device to be set to self.model.device. In the case of Mllama, DeepSpeed is used to create the .language_model on HPU, whereas the rest of the model stays on CPU [2]. Hence, always self.model.device = CPU, which causes the whole model to be placed back on CPU. This commit explicitly moves the model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360

@astachowiczhabana astachowiczhabana marked this pull request as ready for review July 9, 2025 08:34
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jul 9, 2025

Why not targeting the main branch?

@astachowiczhabana
Copy link
Copy Markdown
Collaborator

Hi @regisss
We're targeting v1.19-release actually to allow all tests to pass

@pbielak pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from a840dd0 to 0e255c3 Compare July 9, 2025 11:06
@pbielak pbielak changed the base branch from v1.18-release to v1.19-release July 9, 2025 11:06
@astachowiczhabana astachowiczhabana self-assigned this Jul 9, 2025
@pbielak pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from 0e255c3 to 0f03cc1 Compare July 10, 2025 07:34
With Llama 4 support in Transformers 4.51, there was a change in the
`Pipeline` class [1], which causes the pipeline device to be set to
`self.model.device`. In the case of Mllama, DeepSpeed is used to create
the `.language_model` on HPU, whereas the rest of the model stays
on CPU [2]. Hence, always `self.model.device = CPU`, which causes the
whole model to be placed back on CPU. This commit explicitly moves the
model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
@pbielak pbielak force-pushed the dev/pbielak/fix-mllama-model-placement branch from 0f03cc1 to 86537e5 Compare July 11, 2025 09:20
Copy link
Copy Markdown
Collaborator

@karol-brejna-i karol-brejna-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karol-brejna-i
Copy link
Copy Markdown
Collaborator

karol-brejna-i commented Jul 14, 2025

This PR looks like a solution for blockers of release 1.19, as well as 1.18.1. @regisss If merged, please cherry pick it to v1.18-release branch.

@astachowiczhabana astachowiczhabana merged commit 7182e21 into huggingface:v1.19-release Jul 14, 2025
1 check passed
astachowiczhabana pushed a commit that referenced this pull request Jul 14, 2025
With Llama 4 support in Transformers 4.51, there was a change in the
`Pipeline` class [1], which causes the pipeline device to be set to
`self.model.device`. In the case of Mllama, DeepSpeed is used to create
the `.language_model` on HPU, whereas the rest of the model stays
on CPU [2]. Hence, always `self.model.device = CPU`, which causes the
whole model to be placed back on CPU. This commit explicitly moves the
model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
@pbielak pbielak deleted the dev/pbielak/fix-mllama-model-placement branch July 16, 2025 08:00
astachowiczhabana pushed a commit that referenced this pull request Aug 22, 2025
With Llama 4 support in Transformers 4.51, there was a change in the
`Pipeline` class [1], which causes the pipeline device to be set to
`self.model.device`. In the case of Mllama, DeepSpeed is used to create
the `.language_model` on HPU, whereas the rest of the model stays
on CPU [2]. Hence, always `self.model.device = CPU`, which causes the
whole model to be placed back on CPU. This commit explicitly moves the
model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
astachowiczhabana pushed a commit that referenced this pull request Aug 25, 2025
With Llama 4 support in Transformers 4.51, there was a change in the
`Pipeline` class [1], which causes the pipeline device to be set to
`self.model.device`. In the case of Mllama, DeepSpeed is used to create
the `.language_model` on HPU, whereas the rest of the model stays
on CPU [2]. Hence, always `self.model.device = CPU`, which causes the
whole model to be placed back on CPU. This commit explicitly moves the
model to HPU, so the pipeline will be also placed on HPU.

[1] https://github.com/huggingface/transformers/pull/37307/files#diff-441f558737166b045444da9c4be81f566b3d69054e8f20e288aed746a691fa61
[2] https://github.com/huggingface/optimum-habana/blob/v1.18.0/examples/image-to-text/run_pipeline.py#L360
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants