Skip to content

[SW-185803] Enable FusedSDPA fp8 in Llama FT#291

Merged
vivekgoe merged 1 commit into
habana-mainfrom
dev/pbielak/enable-fusedSDPA-fp8
Jul 12, 2024
Merged

[SW-185803] Enable FusedSDPA fp8 in Llama FT#291
vivekgoe merged 1 commit into
habana-mainfrom
dev/pbielak/enable-fusedSDPA-fp8

Conversation

@pbielak
Copy link
Copy Markdown

@pbielak pbielak commented Jul 10, 2024

This PR enables the usage of Fused Scaled Dot Product Attention in the FP8 version of the LLama model. Tested on LLama finetuning using LoRA. Set the --flash_attention_fp8 flag to use FusedSDPA.

- Update attention module and usages
- Add --flash_attention_fp8 flag
@pbielak pbielak requested a review from a user July 10, 2024 11:26
@vivekgoe vivekgoe requested a review from scsudhak-intel July 10, 2024 11:47
Copy link
Copy Markdown

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vivekgoe vivekgoe merged commit 35f6fbe into habana-main Jul 12, 2024
MrGeva pushed a commit that referenced this pull request Jul 14, 2024
kalyanjk pushed a commit to kalyanjk/optimum-habana-fork that referenced this pull request Jul 15, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
oabramovich pushed a commit that referenced this pull request Jul 15, 2024
…" (#295)

This reverts commit 35f6fbe.

Co-authored-by: Eran Geva <egeva@habana.ai>
pbielak added a commit that referenced this pull request Jul 17, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
@astachowiczhabana
Copy link
Copy Markdown

Unmatched PR

pbielak added a commit that referenced this pull request Jul 30, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Add flag to README.md
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
pbielak added a commit that referenced this pull request Jul 30, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
pbielak added a commit that referenced this pull request Jul 30, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
pbielak added a commit that referenced this pull request Jul 31, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls
pbielak added a commit that referenced this pull request Jul 31, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls
vivekgoe pushed a commit that referenced this pull request Aug 1, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls
@vidyasiv
Copy link
Copy Markdown

vidyasiv commented Aug 2, 2024

@pbielak, please propagate #291 and #310 to optimum-habana for v1.17 release. The documentation PR already appears to be in OH.

astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Aug 7, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Oct 9, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants