Skip to content

Encapsulate FSDPA in GaudiLlamaAttention#129

Merged
MrGeva merged 1 commit into
habana-mainfrom
dev/dlester/moduleFSDPA
Mar 24, 2024
Merged

Encapsulate FSDPA in GaudiLlamaAttention#129
MrGeva merged 1 commit into
habana-mainfrom
dev/dlester/moduleFSDPA

Conversation

@dudilester
Copy link
Copy Markdown

  • Done to allow quantization using HQT

  • Added use_flash_attention and flash_attention_recompute to run_lm_eval

@dudilester dudilester requested review from a user, HolyFalafel, MrGeva, Yantom1, bgoldberg-habana and ulivne and removed request for a user, libinta and mandy-li March 21, 2024 09:43
ulivne
ulivne previously requested changes Mar 21, 2024
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
@dudilester dudilester force-pushed the dev/dlester/moduleFSDPA branch from 0a09511 to d584181 Compare March 21, 2024 13:12
@MrGeva MrGeva dismissed ulivne’s stale review March 24, 2024 16:25

issues were addressed.

@MrGeva MrGeva merged commit b7e74c1 into habana-main Mar 24, 2024
dudilester added a commit that referenced this pull request Mar 31, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 5, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 5, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 19, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 22, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 7, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 8, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 13, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
@astachowiczhabana
Copy link
Copy Markdown

huggingface#972

@dudilester
Copy link
Copy Markdown
Author

upstream URL
huggingface#976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants