Skip to content

Run Llama2 with torch.compile on Gaudi2#616

Merged
regisss merged 4 commits into
huggingface:mainfrom
kausikmaiti:llama2_with_torch_compile_on_gaudi2_1
Jan 23, 2024
Merged

Run Llama2 with torch.compile on Gaudi2#616
regisss merged 4 commits into
huggingface:mainfrom
kausikmaiti:llama2_with_torch_compile_on_gaudi2_1

Conversation

@kausikmaiti
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: kausik <kmaiti@habana.ai>
@kausikmaiti kausikmaiti requested a review from regisss as a code owner December 28, 2023 06:57
@vivekgoe vivekgoe added the run-test Run CI for PRs from external contributors label Jan 1, 2024
@vivekgoe vivekgoe self-requested a review January 1, 2024 08:03
@vivekgoe
Copy link
Copy Markdown
Collaborator

vivekgoe commented Jan 1, 2024

@kausikmaiti looks good to me. @regisss please review and help merge this if it looks ok to you. Thanks.

@vivekgoe vivekgoe requested a review from libinta January 1, 2024 08:04
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kausikmaiti I have following questions

  1. what's the default cmd for llama2? torch compile or use_lazy_mode? if the default is torch.compile, please also update the readme
  2. what's the performance comparison with use_lazy_mode and use_hpu_graph?
  3. any depenency to the to-be-released docker?

Comment thread examples/text-generation/utils.py
@kausikmaiti
Copy link
Copy Markdown
Contributor Author

@kausikmaiti I have following questions

  1. what's the default cmd for llama2? torch compile or use_lazy_mode? if the default is torch.compile, please also update the readme
  2. what's the performance comparison with use_lazy_mode and use_hpu_graph?
  3. any depenency to the to-be-released docker?

@libinta, Please find my answers below.

  1. Here, I am just adding support for torch.compile. But, default mode is still unchanged. So, we don't need to update readme here.
  2. This is the very first step of (functional) enablement. Performance data is not yet available.
  3. Yes. It has dependency on the content of release 1.14.

model, tokenizer, generation_config = initialize_model(args, logger)

use_lazy_mode = True
if args.torch_compile:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid this by using args.torch_compile directly in 312 and 492 with lazy_mode = not args.torch_compile.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be done of course. But, I wanted to have the decision making about use_lazy_mode in one place for all. A secondary reason is that I plan to include additional checks in future PR to control use_lazy_mode.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 4, 2024

Yes. It has dependency on the content of release 1.14.

So we should not merge it before 1.14 is released @libinta right?

model = wrap_in_hpu_graph(model)

if args.torch_compile:
model = get_torch_compiled_model(model)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kausikmaiti Can we add model specific check as generation using torch.compile isn't verified on models

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added model specific check in separate commit. Please review.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 11, 2024

@kausikmaiti Let's also add a test to: https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py

You can define a new test at the end of this file:

@pytest.mark.parametrize("model_name, baseline", MODELS_TO_TEST["torch_compile"])
def test_text_generation_torch_compile(model_name: str, baseline: float, token: str):
    _test_text_generation(model_name, baseline, token, torch_compile=True)

and adding a new torch_compile entry in the dict here:

Signed-off-by: kausik <kmaiti@habana.ai>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.environ["WORLD_SIZE"] = "0" ? WORLD_SIZE should be set to 1 for 1x runs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned offline, WORLD_SIZE setting does not matter, as I'm not using deepspeed / gaudi_spawn.py script.
Also as per my observation, if I don't set WORLD_SIZE=0, due to the logic like "use_deepspeed = args.world_size > 0", setup_distributed_model() gets called and the test fails at very early stage while importing deepspeed. This is not the expectation.

@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 23, 2024
@regisss regisss merged commit 1ecc732 into huggingface:main Jan 23, 2024
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants