Skip to content

Run Llama2 with torch.compile on Gaudi2#605

Closed
kausikmaiti wants to merge 1 commit into
huggingface:mainfrom
kausikmaiti:llama2_with_torch_compile_on_gaudi2
Closed

Run Llama2 with torch.compile on Gaudi2#605
kausikmaiti wants to merge 1 commit into
huggingface:mainfrom
kausikmaiti:llama2_with_torch_compile_on_gaudi2

Conversation

@kausikmaiti
Copy link
Copy Markdown
Contributor

What does this PR do?

This change allows the user to run Llama2 model with torch.compile on Gaudi2.

Signed-off-by: kausik <kmaiti@habana.ai>
@vivekgoe vivekgoe added the run-test Run CI for PRs from external contributors label Dec 19, 2023
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss can you please help review this change? this is needed to enable torch.compile for text generation tasks (for Llama and other models).

output_hidden_states=output_hidden_states,
**hpu_graphs_kwargs,
)
if torch_compile:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapping model only for greedy_search does not look right, it should probably be done in generate() so that it works for other modes (such as beam_search also),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even sure we should do it in generate at all. If using the trainer, it should already be taken care of (see discussion above). Otherwise, for example in the text-generation example, I think we should just have a get_torch_compiled_model in text-generation/utils.py. That seems to be the way recommended by Transformers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regisss thanks for your comments, we will check if we can go with adding get_torch_compiled_model in text-generation/utils.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I would create 'get_torch_compiled_model' in text-generation/utils.py.

negative_prompt_ids: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.Tensor] = None,
lazy_mode: Optional[bool] = False,
torch_compile: Optional[bool] = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For normal training, eval, predict models are wrapped within accelerator.prepare_model() call, adding new code for generate() may not be aligned. @regisss any idea how direct model.generate() calls are handled in transformers for compile mode, I tried to search there but did not find anything.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the trainer, the link with Accelerate is made here:


And then in Accelerate it happens here:
if self.state.dynamo_plugin.backend != GaudiDynamoBackend.NO and not is_compiled_module(model):

It was introduced in #465.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the trainer, Transformers recommends to simply use:

model = torch.compile(model)

https://huggingface.co/docs/transformers/v4.36.1/en/perf_torch_compile

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, I would create 'get_torch_compiled_model()' in text-generation/utils.py. And this will be called inside setup_model() in text-generation/utils.py.

help="Whether to use the key/value cache for decoding. It should speed up generation.",
)
parser.add_argument(
"--use_torch_compile",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"--use_torch_compile",
"--torch_compile",

to be aligned with Transformers and GaudiTrainingArguments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I would change.

negative_prompt_ids: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.Tensor] = None,
lazy_mode: Optional[bool] = False,
torch_compile: Optional[bool] = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the trainer, the link with Accelerate is made here:


And then in Accelerate it happens here:
if self.state.dynamo_plugin.backend != GaudiDynamoBackend.NO and not is_compiled_module(model):

It was introduced in #465.

negative_prompt_ids: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.Tensor] = None,
lazy_mode: Optional[bool] = False,
torch_compile: Optional[bool] = False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the trainer, Transformers recommends to simply use:

model = torch.compile(model)

https://huggingface.co/docs/transformers/v4.36.1/en/perf_torch_compile

output_hidden_states=output_hidden_states,
**hpu_graphs_kwargs,
)
if torch_compile:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even sure we should do it in generate at all. If using the trainer, it should already be taken care of (see discussion above). Otherwise, for example in the text-generation example, I think we should just have a get_torch_compiled_model in text-generation/utils.py. That seems to be the way recommended by Transformers.

@kausikmaiti
Copy link
Copy Markdown
Contributor Author

I created a separate PR after making necessary changes. Kindly refer to #616

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
… generation tests (huggingface#2200) (huggingface#605)

Co-authored-by: Grzegorz Pluto-Prondzinski <gplutopx@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants