Tensor parallel distributed strategy without using deepspeed by kalyanjk · Pull Request #1121 · huggingface/optimum-habana

kalyanjk · 2024-07-03T14:20:53Z

Tensor parallel by extending GaudiLlamaAttention -> TPGaudiLlamaAttention and GaudiLlamaMLP -> TPGaudiLlamaMLP

use parameter --distributed_strategy="tp" to invoke this code path

code design reference: https://github.com/foundation-model-stack/foundation-model-stack/tree/main

HuggingFaceDocBuilderDev · 2024-07-15T16:45:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

I left a few comments. Additionally, can you:

run make style?
add an example command in the README of the text-generation example?
add a test for it in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py?
add a link to the original implementation in all files that are inspired from it?
check if there are any copyrights to cite?

kalyanjk · 2024-07-17T15:59:42Z

I left a few comments. Additionally, can you:

run make style?

add an example command in the README of the text-generation example?

add a test for it in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py?

add a link to the original implementation in all files that are inspired from it?

check if there are any copyrights to cite?

Done

kalyanjk · 2024-07-18T05:06:23Z

@regisss Addressed all your comments. Please review the changes. Thank you!

regisss

Can you update your main branch and merge it into this PR? To have the whole CI working again.

regisss · 2024-07-24T14:41:29Z


 You will also need to add `--torch_compile` in your command.

+### Running with Tesor parallel strategy


Suggested change

### Running with Tesor parallel strategy

### Running with tensor-parallel strategy

regisss · 2024-07-24T14:42:03Z

+### Running with Tesor parallel strategy
+#### Attribution
+
+This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.


Suggested change

This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

> [!NOTE]

> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

regisss · 2024-07-24T14:43:41Z

 You will also need to add `--torch_compile` in your command.

+### Running with Tesor parallel strategy
+#### Attribution


I think you can remove that line, let's put it in a "box" as suggested below

I have added in the box, but not sure if this syntax had to be preserved [!WARNING]

regisss · 2024-07-24T14:45:31Z

+
+This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.
+
+torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable


Suggested change

torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable

> [!WARNING]

> torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models.

To enable...

Done, Added Note and Warning in box

libinta · 2024-07-24T17:42:08Z

@kalyanjk can you update based on review comments?

kalyanjk · 2024-07-26T04:21:20Z

@kalyanjk can you update based on review comments?

@libinta, I have addressed all. Anything i have missed?

regisss

I left a few more comments to address.
Also, the test fails on my instance with this error:

Traceback (most recent call last):                                                                                                           
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 674, in <module>                                              
    main()                                                                                                                                   
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 317, in main                                                  
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)                                                    
  File "/root/workspace/fork/examples/text-generation/utils.py", line 592, in initialize_model                                               
    else setup_distributed_model_tp(args, model_dtype, model_kwargs, logger)                                                                 
  File "/root/workspace/fork/examples/text-generation/utils.py", line 281, in setup_distributed_model_tp                                     
    lazy_sd = serialization.load_state_dict(                                                                                                 
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/distributed/serialization.py", line 191, in load_state_dict                   
    assert len(checkpoints) > 0, f"Can't find the requested checkpoint data at {model_path}"                                                 
AssertionError: Can't find the requested checkpoint data at meta-llama/Llama-2-7b-hf

Any idea about what's going on? It seems like a serialization issue. Or is it because it requires Synapse 1.17? I'm running 1.16.

regisss · 2024-07-29T07:51:39Z


 You will also need to add `--torch_compile` in your command.

+### Running with tesor-parallel strategy


Suggested change

### Running with tesor-parallel strategy

### Running with tensor-parallel strategy

regisss · 2024-07-29T07:52:33Z

+```bash
+NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.
+```


Suggested change

```bash

NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

```

> [!NOTE]

> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

updated with the suggested format

regisss · 2024-07-29T07:53:14Z

+
+```bash
+WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models.
+```
+To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the
+command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed.


Suggested change

```bash

WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models.

```

To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the

command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed.

> [!WARNING]

> torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models.

To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the

command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed.

updated with the suggested format

regisss · 2024-07-29T07:56:09Z

+
+Here is an example:
+```bash
+python ../gaudi_spawn.py  --world_size 8 run_generation.py \


Suggested change

python ../gaudi_spawn.py --world_size 8 run_generation.py \

PT_ENABLE_INT64_SUPPORT=1 PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py --world_size 8 run_generation.py \

regisss · 2024-07-29T08:18:22Z

Please run make style too

kalyanjk · 2024-07-29T10:33:06Z

I left a few more comments to address. Also, the test fails on my instance with this error:

Traceback (most recent call last):                                                                                                           
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 674, in <module>                                              
    main()                                                                                                                                   
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 317, in main                                                  
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)                                                    
  File "/root/workspace/fork/examples/text-generation/utils.py", line 592, in initialize_model                                               
    else setup_distributed_model_tp(args, model_dtype, model_kwargs, logger)                                                                 
  File "/root/workspace/fork/examples/text-generation/utils.py", line 281, in setup_distributed_model_tp                                     
    lazy_sd = serialization.load_state_dict(                                                                                                 
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/distributed/serialization.py", line 191, in load_state_dict                   
    assert len(checkpoints) > 0, f"Can't find the requested checkpoint data at {model_path}"                                                 
AssertionError: Can't find the requested checkpoint data at meta-llama/Llama-2-7b-hf

Any idea about what's going on? It seems like a serialization issue. Or is it because it requires Synapse 1.17? I'm running 1.16.

Can you provide with absolute path for meta-llama/Llama-2-7b-hf. All my testing is on 1.17, I will verify on 1.16 and update.

kalyanjk · 2024-07-29T11:52:56Z

I left a few more comments to address. Also, the test fails on my instance with this error:

Traceback (most recent call last):                                                                                                           
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 674, in <module>                                              
    main()                                                                                                                                   
  File "/root/workspace/fork/examples/text-generation/run_generation.py", line 317, in main                                                  
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)                                                    
  File "/root/workspace/fork/examples/text-generation/utils.py", line 592, in initialize_model                                               
    else setup_distributed_model_tp(args, model_dtype, model_kwargs, logger)                                                                 
  File "/root/workspace/fork/examples/text-generation/utils.py", line 281, in setup_distributed_model_tp                                     
    lazy_sd = serialization.load_state_dict(                                                                                                 
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/distributed/serialization.py", line 191, in load_state_dict                   
    assert len(checkpoints) > 0, f"Can't find the requested checkpoint data at {model_path}"                                                 
AssertionError: Can't find the requested checkpoint data at meta-llama/Llama-2-7b-hf

Any idea about what's going on? It seems like a serialization issue. Or is it because it requires Synapse 1.17? I'm running 1.16.

Can you provide with absolute path for meta-llama/Llama-2-7b-hf. All my testing is on 1.17, I will verify on 1.16 and update.

@regisss successfully verified the sanity test for the 1.16 release using both the 7b and 70b models. Everything is working fine.

regisss · 2024-07-29T12:17:39Z

There is no absolute path, this is the hub model id and I really think this use case should work as not everybody has the models stored locally. If the absolute path to the model is needed, there should be some code to find the model in the Transformers cache. You can get the default path to cache with:

from huggingface_hub.constants import HF_HUB_CACHE

More information about the structure of the cache here: https://huggingface.co/docs/huggingface_hub/v0.24.2/en/guides/manage-cache#understand-caching

Also, I see I forgot to mention it, can you replace the arg distributed_strategy by parallelism_strategy or something similar everywhere please? We already have distribution_strategy defined here and it would add a lot of confusion to have both distributed_strategy and distribution_strategy. Sorry for noticing this now, I thought I had commented on it before.

Added test in tests/test_text_generation_example.py add a link to the original implementation for the referenced files

make style changes

kalyanjk · 2024-07-30T08:41:53Z

There is no absolute path, this is the hub model id and I really think this use case should work as not everybody has the models stored locally. If the absolute path to the model is needed, there should be some code to find the model in the Transformers cache. You can get the default path to cache with:
from huggingface_hub.constants import HF_HUB_CACHE
More information about the structure of the cache here: https://huggingface.co/docs/huggingface_hub/v0.24.2/en/guides/manage-cache#understand-caching

Also, I see I forgot to mention it, can you replace the arg distributed_strategy by parallelism_strategy or something similar everywhere please? We already have distribution_strategy defined here and it would add a lot of confusion to have both distributed_strategy and distribution_strategy. Sorry for noticing this now, I thought I had commented on it before.

Updated : renamed the distributed_strategy to parallel_strategy.
In process : Is there a way i can test the relative path for model_name - > meta-llama/Llama-2-7b-hf.

kalyanjk · 2024-07-30T13:07:39Z

There is no absolute path, this is the hub model id and I really think this use case should work as not everybody has the models stored locally. If the absolute path to the model is needed, there should be some code to find the model in the Transformers cache. You can get the default path to cache with:
from huggingface_hub.constants import HF_HUB_CACHE
More information about the structure of the cache here: https://huggingface.co/docs/huggingface_hub/v0.24.2/en/guides/manage-cache#understand-caching
Also, I see I forgot to mention it, can you replace the arg distributed_strategy by parallelism_strategy or something similar everywhere please? We already have distribution_strategy defined here and it would add a lot of confusion to have both distributed_strategy and distribution_strategy. Sorry for noticing this now, I thought I had commented on it before.
Updated : renamed the distributed_strategy to parallel_strategy. In process : Is there a way i can test the relative path for model_name - > meta-llama/Llama-2-7b-hf.

Updated cache_dir setting for parallel_strategy = tp

@regisss can you please verify if you are able to load the data now

regisss

Thanks for the changes, it looks good to me!
One last thing, as written in the comment below, the test fails on my instance because the throughput I get is too low. Maybe due to a different version of Synapse?

…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280) (#299)" This reverts commit 32c86d3. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6 Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk requested review from libinta and mandy-li as code owners July 3, 2024 14:20

kalyanjk requested a review from a user July 3, 2024 14:20

kalyanjk requested a review from regisss as a code owner July 3, 2024 14:20

libinta added the synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content. label Jul 9, 2024

kalyanjk force-pushed the up_tp_strategy branch from 09b081e to 0bf2a63 Compare July 15, 2024 09:03

regisss reviewed Jul 15, 2024

View reviewed changes

kalyanjk force-pushed the up_tp_strategy branch 3 times, most recently from 8ecceca to 2b5f46e Compare July 18, 2024 04:53

kalyanjk requested a review from regisss July 19, 2024 04:40

regisss reviewed Jul 24, 2024

View reviewed changes

kalyanjk force-pushed the up_tp_strategy branch from 24accd1 to 3b9ac3b Compare July 25, 2024 07:18

kalyanjk requested a review from regisss July 26, 2024 05:28

regisss reviewed Jul 29, 2024

View reviewed changes

astachowiczhabana mentioned this pull request Jul 29, 2024

Tensor parallel distributed strategy without using deepspeed HabanaAI/optimum-habana-fork#280

Merged

kalyanjkk added 5 commits July 30, 2024 11:21

TP reference - ibm foundation-model-stack

080a6a1

Code cleanup -removed unused code

4feb085

make style, updated README for distributed_strategy="tp"

a73c6f6

Added test in tests/test_text_generation_example.py add a link to the original implementation for the referenced files

Updated LlamaConfig with distributed_strategy

cca19c2

Updated README.md and test data-set path

dbb316b

kalyanjkk added 3 commits July 30, 2024 11:22

distributed_strategy is not JSON serializable

182baf5

Updated README.md

524d1cc

make style changes

Renamed the distributed_strategy parameter to parallel_strategy

bb19c4e

kalyanjk force-pushed the up_tp_strategy branch from 4c519cf to bb19c4e Compare July 30, 2024 08:36

Updated cache_dir for parallel_strategy = tp

6a04a8c

regisss reviewed Jul 30, 2024

View reviewed changes

Comment thread tests/test_text_generation_example.py Outdated

Updated the perf number for test test_text_generation_distributed_tp

3fd17e7

regisss approved these changes Jul 30, 2024

View reviewed changes

regisss merged commit 139ad89 into huggingface:main Jul 30, 2024

kalyanjk added a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed (hugging…

c18cfa3

…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk mentioned this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed HabanaAI/optimum-habana-fork#320

Merged

kalyanjk added a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed (hugging…

aec5018

…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk mentioned this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed HabanaAI/optimum-habana-fork#321

Merged

astachowiczhabana mentioned this pull request Aug 5, 2024

Tensor parallel distributed strategy without using deepspeed (#280) HabanaAI/optimum-habana-fork#299

Merged


		You will also need to add `--torch_compile` in your command.

		### Running with Tesor parallel strategy

	### Running with Tesor parallel strategy
	### Running with tensor-parallel strategy

	This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.
	> [!NOTE]
	> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.


		This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

		torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable

-torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable
+> [!WARNING]
+> torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models.
+To enable...


		You will also need to add `--torch_compile` in your command.

		### Running with tesor-parallel strategy

-```bash
-NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.
-```
+> [!NOTE]
+> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details.

	python ../gaudi_spawn.py --world_size 8 run_generation.py \
	PT_ENABLE_INT64_SUPPORT=1 PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py --world_size 8 run_generation.py \

Conversation

kalyanjk commented Jul 3, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Jul 15, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kalyanjk commented Jul 17, 2024

Uh oh!

kalyanjk commented Jul 18, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

libinta commented Jul 24, 2024

Uh oh!

kalyanjk commented Jul 26, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

regisss commented Jul 29, 2024

Uh oh!

kalyanjk commented Jul 29, 2024

Uh oh!

kalyanjk commented Jul 29, 2024

Uh oh!

regisss commented Jul 29, 2024

Uh oh!

kalyanjk commented Jul 30, 2024

Uh oh!

kalyanjk commented Jul 30, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment