Skip to content

[SW-185803] Enable FusedSDPA fp8 in Llama FT#310

Merged
vivekgoe merged 1 commit into
habana-mainfrom
dev/pbielak/enable-fusedSDPA-fp8
Aug 1, 2024
Merged

[SW-185803] Enable FusedSDPA fp8 in Llama FT#310
vivekgoe merged 1 commit into
habana-mainfrom
dev/pbielak/enable-fusedSDPA-fp8

Conversation

@pbielak
Copy link
Copy Markdown

@pbielak pbielak commented Jul 17, 2024

Same PR as in #291, but I fixed the error occurring in the text-generation/run_lm_eval.py script by moving one import from top-level to function-level, i.e., see 3af45d2

@pbielak pbielak requested a review from a user July 17, 2024 14:20
@wszczurekhabana wszczurekhabana self-requested a review July 18, 2024 08:24
@pbielak pbielak requested review from MrGeva, guyeilat, oabramovich and scsudhak-intel and removed request for a user, libinta, mandy-li and ssarkar2 July 18, 2024 12:09
Copy link
Copy Markdown

@wszczurekhabana wszczurekhabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbielak @wszczurekhabana added a few comments, please check.

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Comment thread examples/language-modeling/README.md Outdated
Comment thread optimum/habana/accelerate/utils/transformer_engine.py Outdated
@vivekgoe
Copy link
Copy Markdown

vivekgoe commented Jul 30, 2024

@pbielak Thanks for updates. Please check my latest responses. Please also resolve conflicts in modeling_llama.py.

@pbielak pbielak force-pushed the dev/pbielak/enable-fusedSDPA-fp8 branch 4 times, most recently from 41f8967 to c77f6fb Compare July 31, 2024 12:42
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls
@pbielak pbielak force-pushed the dev/pbielak/enable-fusedSDPA-fp8 branch from c77f6fb to 19384dd Compare July 31, 2024 12:43
@vivekgoe vivekgoe merged commit 234cc25 into habana-main Aug 1, 2024
@pbielak pbielak deleted the dev/pbielak/enable-fusedSDPA-fp8 branch August 1, 2024 10:04
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Aug 6, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Aug 7, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
astachowiczhabana pushed a commit that referenced this pull request Oct 9, 2024
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
- Update attention module and usages
- Add --flash_attention_fp8 flag
- Fix failure in distributed text-generation
- Add assert for Gaudi 3
- Remove unnecessary repeat_kv and reshape
- Rename FusedAttention to FusedAttentionTE
- Move flash_attention_fp8 checks
- Fix fused_scaled_dot_product_attention calls

Change-Id: Ica468bb23931a78e2a23f6cb9bc60f87dd442007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants