Tensor parallel distributed strategy without using deepspeed#1121
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
regisss
left a comment
There was a problem hiding this comment.
I left a few comments. Additionally, can you:
- run
make style? - add an example command in the README of the text-generation example?
- add a test for it in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py?
- add a link to the original implementation in all files that are inspired from it?
- check if there are any copyrights to cite?
Done |
8ecceca to
2b5f46e
Compare
|
@regisss Addressed all your comments. Please review the changes. Thank you! |
regisss
left a comment
There was a problem hiding this comment.
Can you update your main branch and merge it into this PR? To have the whole CI working again.
|
|
||
| You will also need to add `--torch_compile` in your command. | ||
|
|
||
| ### Running with Tesor parallel strategy |
There was a problem hiding this comment.
| ### Running with Tesor parallel strategy | |
| ### Running with tensor-parallel strategy |
| ### Running with Tesor parallel strategy | ||
| #### Attribution | ||
|
|
||
| This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
There was a problem hiding this comment.
| This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | |
| > [!NOTE] | |
| > This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
| You will also need to add `--torch_compile` in your command. | ||
|
|
||
| ### Running with Tesor parallel strategy | ||
| #### Attribution |
There was a problem hiding this comment.
I think you can remove that line, let's put it in a "box" as suggested below
There was a problem hiding this comment.
I have added in the box, but not sure if this syntax had to be preserved [!WARNING]
|
|
||
| This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | ||
|
|
||
| torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable |
There was a problem hiding this comment.
| torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable | |
| > [!WARNING] | |
| > torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
| To enable... |
There was a problem hiding this comment.
Done, Added Note and Warning in box
|
@kalyanjk can you update based on review comments? |
regisss
left a comment
There was a problem hiding this comment.
I left a few more comments to address.
Also, the test fails on my instance with this error:
Traceback (most recent call last):
File "/root/workspace/fork/examples/text-generation/run_generation.py", line 674, in <module>
main()
File "/root/workspace/fork/examples/text-generation/run_generation.py", line 317, in main
model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
File "/root/workspace/fork/examples/text-generation/utils.py", line 592, in initialize_model
else setup_distributed_model_tp(args, model_dtype, model_kwargs, logger)
File "/root/workspace/fork/examples/text-generation/utils.py", line 281, in setup_distributed_model_tp
lazy_sd = serialization.load_state_dict(
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/distributed/serialization.py", line 191, in load_state_dict
assert len(checkpoints) > 0, f"Can't find the requested checkpoint data at {model_path}"
AssertionError: Can't find the requested checkpoint data at meta-llama/Llama-2-7b-hf
Any idea about what's going on? It seems like a serialization issue. Or is it because it requires Synapse 1.17? I'm running 1.16.
|
|
||
| You will also need to add `--torch_compile` in your command. | ||
|
|
||
| ### Running with tesor-parallel strategy |
There was a problem hiding this comment.
| ### Running with tesor-parallel strategy | |
| ### Running with tensor-parallel strategy |
| ```bash | ||
| NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | ||
| ``` |
There was a problem hiding this comment.
| ```bash | |
| NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | |
| ``` | |
| > [!NOTE] | |
| > This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
There was a problem hiding this comment.
updated with the suggested format
|
|
||
| ```bash | ||
| WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | ||
| ``` | ||
| To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | ||
| command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. |
There was a problem hiding this comment.
| ```bash | |
| WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
| ``` | |
| To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | |
| command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. | |
| > [!WARNING] | |
| > torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
| To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | |
| command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. |
There was a problem hiding this comment.
updated with the suggested format
|
|
||
| Here is an example: | ||
| ```bash | ||
| python ../gaudi_spawn.py --world_size 8 run_generation.py \ |
There was a problem hiding this comment.
| python ../gaudi_spawn.py --world_size 8 run_generation.py \ | |
| PT_ENABLE_INT64_SUPPORT=1 PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py --world_size 8 run_generation.py \ |
|
Please run |
Can you provide with absolute path for meta-llama/Llama-2-7b-hf. All my testing is on 1.17, I will verify on 1.16 and update. |
@regisss successfully verified the sanity test for the 1.16 release using both the 7b and 70b models. Everything is working fine. |
|
There is no absolute path, this is the hub model id and I really think this use case should work as not everybody has the models stored locally. If the absolute path to the model is needed, there should be some code to find the model in the Transformers cache. You can get the default path to cache with: More information about the structure of the cache here: https://huggingface.co/docs/huggingface_hub/v0.24.2/en/guides/manage-cache#understand-caching Also, I see I forgot to mention it, can you replace the arg |
Added test in tests/test_text_generation_example.py add a link to the original implementation for the referenced files
Updated : renamed the distributed_strategy to parallel_strategy. |
Updated cache_dir setting for parallel_strategy = tp @regisss can you please verify if you are able to load the data now |
regisss
left a comment
There was a problem hiding this comment.
Thanks for the changes, it looks good to me!
One last thing, as written in the comment below, the test fails on my instance because the throughput I get is too low. Maybe due to a different version of Synapse?
…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>
…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>
* Revert "Tensor parallel distributed strategy without using deepspeed (#280) (#299)" This reverts commit 32c86d3. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>
* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>
* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6 Co-authored-by: Kalyan <kkumar@habana.ai>
Tensor parallel by extending GaudiLlamaAttention -> TPGaudiLlamaAttention and GaudiLlamaMLP -> TPGaudiLlamaMLP
use parameter --distributed_strategy="tp" to invoke this code path
code design reference: https://github.com/foundation-model-stack/foundation-model-stack/tree/main