Skip to content

enforce recompute flag on fsdpa quantization#133

Merged
MrGeva merged 1 commit into
habana-mainfrom
dev/dlester/quant_recompute
Mar 28, 2024
Merged

enforce recompute flag on fsdpa quantization#133
MrGeva merged 1 commit into
habana-mainfrom
dev/dlester/quant_recompute

Conversation

@dudilester
Copy link
Copy Markdown

Currently fp8fsdpa quantization supported only when flash_attention_recompute is True

@dudilester dudilester requested a review from a user March 28, 2024 13:23
@astachowiczhabana
Copy link
Copy Markdown

huggingface#972

@dudilester
Copy link
Copy Markdown
Author

upstream URL
huggingface#976

astachowiczhabana pushed a commit that referenced this pull request Feb 5, 2025
…ers (#133)

* Update transformer_engine._convert_model to skip LoRA layers

* Remove print statement

* Add check for peft module availability
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
…ers (#133)

* Update transformer_engine._convert_model to skip LoRA layers

* Remove print statement

* Add check for peft module availability
astachowiczhabana pushed a commit that referenced this pull request Mar 5, 2025
…ers (#133) (#163)

* Update transformer_engine._convert_model to skip LoRA layers

* Remove print statement

* Add check for peft module availability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants