Tensor parallel distributed strategy without using deepspeed by kalyanjk · Pull Request #280 · HabanaAI/optimum-habana-fork

kalyanjk · 2024-07-02T13:00:16Z

Tensor parallel by extending GaudiLlamaAttention -> TPGaudiLlamaAttention and GaudiLlamaMLP -> TPGaudiLlamaMLP

use parameter --distributed_strategy="tp" to invoke this code path

msinnha1

This is a big patch and reviewing it further

msinnha1 · 2024-07-09T12:24:39Z

            global has_fused_rope
            has_fused_rope = False

+


minor: please remove this

minor: please remove this

Done

msinnha1 · 2024-07-09T12:27:38Z

-            [GaudiLlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
-        )
+        layers = []
+        for i in range(config.num_hidden_layers):


minor: layer_idx in place of 'i'

minor: layer_idx in place of 'i'

Done

msinnha1 · 2024-07-12T04:33:50Z

+import torch.distributed
+from torch import nn
+
+#from optimum.habana.distributed import tp_wrapping


minor: please remove the commented code

minor: please remove the commented code

Done

msinnha1 · 2024-07-12T04:39:43Z

+        pass
+
+
+class NotDistributed(DistributedStrategy):


why derived class is NotDistributed for the base class as DistributedStrategy? It is creating some confusion in readability, may require some other name?

msinnha1 · 2024-07-12T04:45:27Z

+    def distribute_layer(self, block: nn.Module, layer: int) -> nn.Module:
+        device = self.layer_to_device[layer]
+        if self.from_meta:
+            # https://github.com/pytorch/pytorch/pull/113647


this PR is closed, and we can possibly remove the reference to such comments from foundation repo, #Comment

msinnha1 · 2024-07-12T04:46:23Z

+        )
+        if par_mod.bias is not None:
+            par_mod.bias.copy_(torch.split(mod.bias, output_size_per_partition)[rank])
+    # print(f"For rank {rank}, we have the following weights: Base weight {mod.weight} bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")


msinnha1 · 2024-07-12T04:46:39Z

+                par_mod.bias.copy_(mod.bias)
+            else:
+                par_mod.bias.zero_()
+    # print(f"For rank {rank}, we have the following weights: Base weight {mod.weight}, bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")


msinnha1 · 2024-07-12T04:47:19Z

+        par_mod.weight.copy_(
+            torch.split(mod.weight, output_size_per_partition, dim=1)[rank]
+        )
+    # print(f"For rank {rank}, we have the following weights: Base weight {mod.weight} bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")


msinnha1 · 2024-07-12T04:47:45Z

+    # The transposes here are to avoid excessive recompilation due to split()
+    # specializing the dimension where the all_gather is happening
+    last_dim = input_.dim() - 1
+    # Starting PT 2.3, we can go back to funcol.all_gather_tensor


…I#280) * TP reference - ibm foundation-model-stack * Code cleanup -removed unused code --------- Co-authored-by: Kalyan <kkumar@habana.ai>

…299) * TP reference - ibm foundation-model-stack * Code cleanup -removed unused code --------- Co-authored-by: Kalyan <kkumar@habana.ai>

astachowiczhabana · 2024-07-29T10:35:54Z

huggingface#1121

…abanaAI#280) (HabanaAI#299)" This reverts commit 32c86d3.

…abanaAI#280)" This reverts commit c6e5f9c.

* Revert "Tensor parallel distributed strategy without using deepspeed (#280) (#299)" This reverts commit 32c86d3. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6 Co-authored-by: Kalyan <kkumar@habana.ai>

* TP reference - ibm foundation-model-stack * Code cleanup -removed unused code --------- Change-Id: I236c615a2523057ecf556800ba6d82062c9c9a82 Co-authored-by: Kalyan <kkumar@habana.ai>

* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6 Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk requested review from libinta and mandy-li as code owners July 2, 2024 13:00

kalyanjk requested a review from a user July 2, 2024 13:00

msinnha1 reviewed Jul 12, 2024

View reviewed changes

kalyanjk requested review from bhargaveede, ssarkar2 and vivekgoe as code owners July 15, 2024 07:02

TP reference - ibm foundation-model-stack

a9275fd

kalyanjk force-pushed the tp_strategy branch 3 times, most recently from 576860d to 1e82fac Compare July 15, 2024 07:19

Code cleanup -removed unused code

657f3c1

kalyanjk force-pushed the tp_strategy branch from 1e82fac to 657f3c1 Compare July 15, 2024 07:22

msinnha1 approved these changes Jul 15, 2024

View reviewed changes

ghost approved these changes Jul 15, 2024

View reviewed changes

ghost merged commit c6e5f9c into HabanaAI:habana-main Jul 15, 2024

ghost pushed a commit that referenced this pull request Jul 15, 2024

Tensor parallel distributed strategy without using deepspeed (#280) (#…

32c86d3

…299) * TP reference - ibm foundation-model-stack * Code cleanup -removed unused code --------- Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024

Revert "Tensor parallel distributed strategy without using deepspeed (H…

42fdb44

…abanaAI#280) (HabanaAI#299)" This reverts commit 32c86d3.

kalyanjk mentioned this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed #320

Merged

kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024

Revert "Tensor parallel distributed strategy without using deepspeed (H…

bd6520e

…abanaAI#280)" This reverts commit c6e5f9c.

kalyanjk mentioned this pull request Jul 31, 2024

Tensor parallel distributed strategy without using deepspeed #321

Merged

astachowiczhabana pushed a commit that referenced this pull request May 13, 2025

Enable torch.compile mode (#280)

520d84a

astachowiczhabana pushed a commit that referenced this pull request May 22, 2025

Enable torch.compile mode (#280)

3be3a46

This pull request was closed.

Conversation

kalyanjk commented Jul 2, 2024

Uh oh!

msinnha1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana commented Jul 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants