Split the graphs to run with flash_attention on 1x by kalyanjk · Pull Request #75 · HabanaAI/optimum-habana-fork

kalyanjk · 2024-02-26T17:07:19Z

With flash attention enabled for larger batch sizes, recipe arc hbm memory size exceeds QueueComputeScal arc hbm memory. Hence split the graph on 1x.

libinta · 2024-02-26T18:16:33Z


        for layer_idx, decoder_layer in enumerate(self.layers):
+            if  torch.distributed.is_initialized() == False:
+                htcore.mark_step()


@kalyanjk what's the impact for input/output not introduced oom? should we add an argument in text-generation from cmd line?

@kalyanjk ,why only mark_step() for 1x?

For 8x mark_step will be introduced through a collective call.

@kalyanjk what's the impact for input/output not introduced oom? should we add an argument in text-generation from cmd line?

The issue is not with oom. The real issue is recipe size being too large and also compilation time is too high.

Please update as below
if lazy_mode and (torch.distributed.is_initialized() is False or torch.distributed.get_world_size() == 1):

puneeshkhanna · 2024-02-28T07:57:09Z

@kalyanjk - we can abandon this PR. I have handled the change in #65.
This also helps 8x inference.
I m checking 1x perf results too.
Further need to check finetuning script once.

puneeshkhanna · 2024-02-28T08:16:59Z

Wait we should not put mark step after the start of loop. Will create more graphs and perf is lower.

kalyanjk · 2024-02-29T05:43:43Z

Wait we should not put mark step after the start of loop. Will create more graphs and perf is lower.
@puneeshkhanna
On G3 we were seeing good perf with mark_step inside the for loop. With mark_step outside the for loop we are not able to run on single card. This issue is also present in G2

msinnha1

Verified the change and it is required for faster recipe compilation

msinnha1 · 2024-03-01T05:53:13Z

    _gaudi_prepare_4d_causal_attention_mask,
 )

+import habana_frameworks.torch.core as htcore


If you rebase to latest then this htcore import is not required, as it is part of PR#65

msinnha1

lgtm

* Split the graphs to run with flash_attention on 1x * Added lazy_mode check and removed additional htcore import --------- Co-authored-by: Kalyan <kkumar@habana.ai>

kalyanjk · 2024-06-11T05:47:11Z

This PR solves the actual issue #126

astachowiczhabana · 2024-06-12T09:15:15Z

huggingface#875

Split the graphs to run with flash_attention on 1x

ab65e67

kalyanjk requested review from libinta and mandy-li as code owners February 26, 2024 17:07

kalyanjk requested a review from a user February 26, 2024 17:07

libinta reviewed Feb 26, 2024

View reviewed changes

Merge branch 'HabanaAI:habana-main' into decoder_mark_step

791a644

msinnha1 reviewed Mar 1, 2024

View reviewed changes

Added lazy_mode check and removed additional htcore import

d4d1b9c

msinnha1 approved these changes Mar 1, 2024

View reviewed changes

ghost approved these changes Mar 4, 2024

View reviewed changes

ghost merged commit eec5b3f into HabanaAI:habana-main Mar 4, 2024

kalyanjk deleted the decoder_mark_step branch July 5, 2024 11:47

This pull request was closed.

Conversation

kalyanjk commented Feb 26, 2024

Uh oh!

libinta Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

mandy-li Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

kalyanjk Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

kalyanjk Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

puneeshkhanna Mar 5, 2024

Choose a reason for hiding this comment

Uh oh!

puneeshkhanna commented Feb 28, 2024

Uh oh!

puneeshkhanna commented Feb 28, 2024

Uh oh!

kalyanjk commented Feb 29, 2024

Uh oh!

msinnha1 left a comment

Choose a reason for hiding this comment

Uh oh!

msinnha1 Mar 1, 2024

Choose a reason for hiding this comment

Uh oh!

msinnha1 left a comment

Choose a reason for hiding this comment

Uh oh!

kalyanjk commented Jun 11, 2024

Uh oh!

astachowiczhabana commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

astachowiczhabana commented Jun 12, 2024 •

edited

Loading