Disable SDPA attention layer for mistral and gpt_bigode by jiminha · Pull Request #88 · HabanaAI/optimum-habana-fork

jiminha · 2024-03-05T21:13:22Z

Currently both mistral and gpt-bigcode model is failing to run since SDPA attention layer is enabled for both model by default with transformer4.37.2 + torch2.2. Until we have proper FusedSDPA support for these model, we are disabling SDPA attention so it can fall back to original attention layer.

jiminha · 2024-03-06T00:40:54Z

Reviewed with @libinta , she doesn't want this to be added per model, but want to override configuration in generic way so we don't need to modify all individual model file.

libinta · 2024-03-06T05:24:01Z

+    #This model doesn't support SDPA in Gaudi yet, fallback to original code.
+    MODELS_ATTN_IMPLEMENTATION_EAGER = [
+        "gpt_bigcode",
+        "mistral",


@vivekgoe should we add the t5 model here too?

you mean "bart". yes please go ahead and add it.

vivekgoe

Please add "bart" also to list of models not using sdpa.

…emistral pick dd138ac Disable SDPA attention layer for GPT_BIGCODE and Mistral

jiminha · 2024-03-06T07:38:23Z

Please add "bart" also to list of models not using sdpa.

@vivekgoe change merged before I add this. I just tested adding "bart", and remove your original change. it also works for summarization. Do you want to make the change?

astachowiczhabana · 2024-06-07T14:22:57Z

huggingface#771

Disable SDPA attention layer for GPT_BIGCODE and Mistral

dd138ac

jiminha requested review from libinta and vivekgoe March 5, 2024 21:13

jiminha changed the title ~~Disable SDPA attention layer for GPT_BIGCODE and Mistral~~ Disable SDPA attention layer for mistral and gpt_bigode Mar 5, 2024

Override _check_and_enable_sdpa

8d07fb6

jiminha added 2 commits March 5, 2024 18:42

Remove attn_implementation argument option

344ba5f

Remove description

40826ae

libinta reviewed Mar 6, 2024

View reviewed changes

vivekgoe approved these changes Mar 6, 2024

View reviewed changes

jiminha added 2 commits March 5, 2024 23:01

Merge remote-tracking branch 'origin/habana-main' into jha/spdadisabl…

e8d23d6

…emistral pick dd138ac Disable SDPA attention layer for GPT_BIGCODE and Mistral

Add mixtral to eager attention list

302a81d

libinta approved these changes Mar 6, 2024

View reviewed changes

libinta merged commit 9f1da84 into habana-main Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable SDPA attention layer for mistral and gpt_bigode#88

Disable SDPA attention layer for mistral and gpt_bigode#88
libinta merged 6 commits into
habana-mainfrom
jha/spdadisablemistral

jiminha commented Mar 5, 2024

Uh oh!

jiminha commented Mar 6, 2024 •

edited

Loading

Uh oh!

libinta Mar 6, 2024

Uh oh!

vivekgoe Mar 6, 2024

Uh oh!

vivekgoe left a comment

Uh oh!

jiminha commented Mar 6, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jiminha commented Mar 5, 2024

Uh oh!

jiminha commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

libinta Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

vivekgoe Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

vivekgoe left a comment

Choose a reason for hiding this comment

Uh oh!

jiminha commented Mar 6, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiminha commented Mar 6, 2024 •

edited

Loading