Run Llama2 with torch.compile on Gaudi2#616
Conversation
Signed-off-by: kausik <kmaiti@habana.ai>
|
@kausikmaiti looks good to me. @regisss please review and help merge this if it looks ok to you. Thanks. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
libinta
left a comment
There was a problem hiding this comment.
@kausikmaiti I have following questions
- what's the default cmd for llama2? torch compile or use_lazy_mode? if the default is torch.compile, please also update the readme
- what's the performance comparison with use_lazy_mode and use_hpu_graph?
- any depenency to the to-be-released docker?
@libinta, Please find my answers below.
|
| model, tokenizer, generation_config = initialize_model(args, logger) | ||
|
|
||
| use_lazy_mode = True | ||
| if args.torch_compile: |
There was a problem hiding this comment.
We could avoid this by using args.torch_compile directly in 312 and 492 with lazy_mode = not args.torch_compile.
There was a problem hiding this comment.
Yes, that can be done of course. But, I wanted to have the decision making about use_lazy_mode in one place for all. A secondary reason is that I plan to include additional checks in future PR to control use_lazy_mode.
So we should not merge it before 1.14 is released @libinta right? |
| model = wrap_in_hpu_graph(model) | ||
|
|
||
| if args.torch_compile: | ||
| model = get_torch_compiled_model(model) |
There was a problem hiding this comment.
@kausikmaiti Can we add model specific check as generation using torch.compile isn't verified on models
There was a problem hiding this comment.
Added model specific check in separate commit. Please review.
…lama2 Signed-off-by: kausik <kmaiti@habana.ai>
|
@kausikmaiti Let's also add a test to: https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py You can define a new test at the end of this file: @pytest.mark.parametrize("model_name, baseline", MODELS_TO_TEST["torch_compile"])
def test_text_generation_torch_compile(model_name: str, baseline: float, token: str):
_test_text_generation(model_name, baseline, token, torch_compile=True)and adding a new |
Signed-off-by: kausik <kmaiti@habana.ai>
There was a problem hiding this comment.
os.environ["WORLD_SIZE"] = "0" ? WORLD_SIZE should be set to 1 for 1x runs
There was a problem hiding this comment.
As you mentioned offline, WORLD_SIZE setting does not matter, as I'm not using deepspeed / gaudi_spawn.py script.
Also as per my observation, if I don't set WORLD_SIZE=0, due to the logic like "use_deepspeed = args.world_size > 0", setup_distributed_model() gets called and the test fails at very early stage while importing deepspeed. This is not the expectation.
What does this PR do?
Fixes # (issue)
Before submitting