Closed
Conversation
Contributor
Author
225484a to
acb06e3
Compare
Fa V3 api Compress fp8 work so far pull cast out of torch function e2e fp8 stub emulate fa v3 ignore remove example clean up forward save fp8 backward ignore train artifacts just use return_attn_probs match fa behvaior save fa ref add fa_ref fix dropout bug add link optional fp8 p descale rename to v3 fa v3 clean up match backward min diff update varlen api clean up FP8_P_DESCALE update bench and test lint fix mha varlen bug remove .gitignore save lint remove skip bring back skips
d11bb47 to
553df3b
Compare
Collaborator
|
Jenkins CI skipped: Check lint failed. Exiting the entire job... |
Contributor
Author
|
I will reopen in a bit. |
Collaborator
|
Jenkins CI skipped: Required check(s) 'ruff_black' are missing. Exiting the entire job... |
Collaborator
|
Jenkins CI skipped: Check lint failed. Exiting the entire job... |
Contributor
Author
|
Moved here, #1065 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Motivation
Add support for fp8 in Flash Attention
Technical Details
Modify existing code so that it confirms to the flash attention v3 api. A user provides fp8 values for q, k and v and their descale values.
Test Plan
update mha tests and bench code
Test Result
Submission Checklist