Skip to content

Tensor parallel distributed strategy without using deepspeed#280

Merged
2 commits merged into
HabanaAI:habana-mainfrom
kalyanjk:tp_strategy
Jul 15, 2024
Merged

Tensor parallel distributed strategy without using deepspeed#280
2 commits merged into
HabanaAI:habana-mainfrom
kalyanjk:tp_strategy

Conversation

@kalyanjk
Copy link
Copy Markdown

@kalyanjk kalyanjk commented Jul 2, 2024

Tensor parallel by extending GaudiLlamaAttention -> TPGaudiLlamaAttention and GaudiLlamaMLP -> TPGaudiLlamaMLP

use parameter --distributed_strategy="tp" to invoke this code path

@kalyanjk kalyanjk requested review from libinta and mandy-li as code owners July 2, 2024 13:00
@kalyanjk kalyanjk requested a review from a user July 2, 2024 13:00
Copy link
Copy Markdown

@msinnha1 msinnha1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big patch and reviewing it further

global has_fused_rope
has_fused_rope = False


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: please remove this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: please remove this

Done

[GaudiLlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
)
layers = []
for i in range(config.num_hidden_layers):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: layer_idx in place of 'i'

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: layer_idx in place of 'i'

Done

Comment thread optimum/habana/distributed/strategy.py Outdated
import torch.distributed
from torch import nn

#from optimum.habana.distributed import tp_wrapping
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: please remove the commented code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: please remove the commented code

Done

pass


class NotDistributed(DistributedStrategy):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why derived class is NotDistributed for the base class as DistributedStrategy? It is creating some confusion in readability, may require some other name?

Comment thread optimum/habana/distributed/strategy.py Outdated
def distribute_layer(self, block: nn.Module, layer: int) -> nn.Module:
device = self.layer_to_device[layer]
if self.from_meta:
# https://github.com/pytorch/pytorch/pull/113647
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is closed, and we can possibly remove the reference to such comments from foundation repo, #Comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)
if par_mod.bias is not None:
par_mod.bias.copy_(torch.split(mod.bias, output_size_per_partition)[rank])
# print(f"For rank {rank}, we have the following weights: Base weight {mod.weight} bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Comment

par_mod.bias.copy_(mod.bias)
else:
par_mod.bias.zero_()
# print(f"For rank {rank}, we have the following weights: Base weight {mod.weight}, bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Comment

par_mod.weight.copy_(
torch.split(mod.weight, output_size_per_partition, dim=1)[rank]
)
# print(f"For rank {rank}, we have the following weights: Base weight {mod.weight} bias {mod.bias}; Par weight {par_mod.weight}, bias {par_mod.bias}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Comment

# The transposes here are to avoid excessive recompilation due to split()
# specializing the dimension where the all_gather is happening
last_dim = input_.dim() - 1
# Starting PT 2.3, we can go back to funcol.all_gather_tensor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Comment

@kalyanjk kalyanjk force-pushed the tp_strategy branch 3 times, most recently from 576860d to 1e82fac Compare July 15, 2024 07:19
@ghost ghost merged commit c6e5f9c into HabanaAI:habana-main Jul 15, 2024
kalyanjk added a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 15, 2024
…I#280)

* TP reference -  ibm foundation-model-stack

* Code cleanup -removed unused code

---------

Co-authored-by: Kalyan <kkumar@habana.ai>
ghost pushed a commit that referenced this pull request Jul 15, 2024
…299)

* TP reference -  ibm foundation-model-stack

* Code cleanup -removed unused code

---------

Co-authored-by: Kalyan <kkumar@habana.ai>
@astachowiczhabana
Copy link
Copy Markdown

huggingface#1121

kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024
kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 31, 2024
ghost pushed a commit that referenced this pull request Jul 31, 2024
* Revert "Tensor parallel  distributed strategy without using deepspeed (#280) (#299)"

This reverts commit 32c86d3.

* Tensor parallel distributed strategy without using deepspeed (huggingface#1121)

Co-authored-by: Kalyan <kkumar@habana.ai>

---------

Co-authored-by: Kalyan <kkumar@habana.ai>
ghost pushed a commit that referenced this pull request Jul 31, 2024
* Revert "Tensor parallel  distributed strategy without using deepspeed (#280)"

This reverts commit c6e5f9c.

* Tensor parallel distributed strategy without using deepspeed (huggingface#1121)

Co-authored-by: Kalyan <kkumar@habana.ai>

---------

Co-authored-by: Kalyan <kkumar@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
* Revert "Tensor parallel  distributed strategy without using deepspeed (#280)"

This reverts commit c6e5f9c.

* Tensor parallel distributed strategy without using deepspeed (huggingface#1121)

Co-authored-by: Kalyan <kkumar@habana.ai>

---------

Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6
Co-authored-by: Kalyan <kkumar@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
* Revert "Tensor parallel  distributed strategy without using deepspeed (#280)"

This reverts commit c6e5f9c.

* Tensor parallel distributed strategy without using deepspeed (huggingface#1121)

Co-authored-by: Kalyan <kkumar@habana.ai>

---------

Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6
Co-authored-by: Kalyan <kkumar@habana.ai>
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* TP reference -  ibm foundation-model-stack

* Code cleanup -removed unused code

---------

Change-Id: I236c615a2523057ecf556800ba6d82062c9c9a82
Co-authored-by: Kalyan <kkumar@habana.ai>
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* Revert "Tensor parallel  distributed strategy without using deepspeed (#280)"

This reverts commit c6e5f9c.

* Tensor parallel distributed strategy without using deepspeed (huggingface#1121)

Co-authored-by: Kalyan <kkumar@habana.ai>

---------

Change-Id: Ic30c85e697dbd6a51767e21e1c06c9a20120d9f6
Co-authored-by: Kalyan <kkumar@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request May 13, 2025
astachowiczhabana pushed a commit that referenced this pull request May 22, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants