Skip to content

Quantization for FSDPA#976

Merged
regisss merged 7 commits into
huggingface:mainfrom
HabanaAI:dev/dlester/fsdpa_doc
Jun 6, 2024
Merged

Quantization for FSDPA#976
regisss merged 7 commits into
huggingface:mainfrom
HabanaAI:dev/dlester/fsdpa_doc

Conversation

@dudilester
Copy link
Copy Markdown
Contributor

@dudilester dudilester commented May 13, 2024

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval
Enforce recompute flag on fsdpa quantization
Allow quantization using HQT
Document FusedScaledDotProductAttention quantization

@dudilester dudilester requested a review from a user May 13, 2024 10:01
@dudilester dudilester requested a review from regisss as a code owner May 13, 2024 10:01
@dudilester
Copy link
Copy Markdown
Contributor Author

Added a commit for documenting the fsdpa quantization changes.
This PR includes the below PR commits + the doc commit
#967
@libinta - the PR should be labeled synapse_1.16_dependency

@dudilester dudilester changed the title Document FusedScaledDotProductAttention quantization Quantization for FSDPA May 15, 2024
@libinta libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 15, 2024
@ssarkar2 ssarkar2 mentioned this pull request May 16, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run make style.

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@MrGeva
Copy link
Copy Markdown

MrGeva commented May 30, 2024

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@regisss I see that sdpa is not tested in bf16 too. it can be added. can you or @libinta take care of it?

hsubramony added a commit that referenced this pull request May 31, 2024
@regisss regisss merged commit 3c6e508 into huggingface:main Jun 6, 2024
imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024
Co-authored-by: Yeonsil Yoon <yyoon@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

synapse 1.16_dependency synapse 1.16 dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants