Skip to content

Disable SDPA attention layer for mistral and gpt_bigode#88

Merged
libinta merged 6 commits into
habana-mainfrom
jha/spdadisablemistral
Mar 6, 2024
Merged

Disable SDPA attention layer for mistral and gpt_bigode#88
libinta merged 6 commits into
habana-mainfrom
jha/spdadisablemistral

Conversation

@jiminha
Copy link
Copy Markdown

@jiminha jiminha commented Mar 5, 2024

Currently both mistral and gpt-bigcode model is failing to run since SDPA attention layer is enabled for both model by default with transformer4.37.2 + torch2.2. Until we have proper FusedSDPA support for these model, we are disabling SDPA attention so it can fall back to original attention layer.

@jiminha jiminha requested review from libinta and vivekgoe March 5, 2024 21:13
@jiminha jiminha changed the title Disable SDPA attention layer for GPT_BIGCODE and Mistral Disable SDPA attention layer for mistral and gpt_bigode Mar 5, 2024
@jiminha
Copy link
Copy Markdown
Author

jiminha commented Mar 6, 2024

Reviewed with @libinta , she doesn't want this to be added per model, but want to override configuration in generic way so we don't need to modify all individual model file.

#This model doesn't support SDPA in Gaudi yet, fallback to original code.
MODELS_ATTN_IMPLEMENTATION_EAGER = [
"gpt_bigcode",
"mistral",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekgoe should we add the t5 model here too?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean "bart". yes please go ahead and add it.

Copy link
Copy Markdown

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "bart" also to list of models not using sdpa.

@libinta libinta merged commit 9f1da84 into habana-main Mar 6, 2024
@jiminha
Copy link
Copy Markdown
Author

jiminha commented Mar 6, 2024

Please add "bart" also to list of models not using sdpa.

@vivekgoe change merged before I add this. I just tested adding "bart", and remove your original change. it also works for summarization. Do you want to make the change?

@astachowiczhabana
Copy link
Copy Markdown

huggingface#771

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants