Skip to content

[BACKEND] Add support converting MMAV3 accumulator layout to fp8 dot_operand#3370

Merged
ThomasRaoux merged 3 commits intotriton-lang:mainfrom
ThomasRaoux:attention_fp8_4
Mar 14, 2024
Merged

[BACKEND] Add support converting MMAV3 accumulator layout to fp8 dot_operand#3370
ThomasRaoux merged 3 commits intotriton-lang:mainfrom
ThomasRaoux:attention_fp8_4

Conversation

@ThomasRaoux
Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux commented Mar 14, 2024

Implement layout conversion invented by Ganesh Bikshandi and Jay Shah described in the following paper:
https://research.colfax-intl.com/wp-content/uploads/2023/12/colfax-flashattention.pdf

This will allow us generating full fp8 attention kernel by having chain of dots in fp8 without going through shared memory.

…operand

This will allow us generating full fp8 attention kernel by having chain
of dots in fp8 without going through shared memory.
@ptillet
Copy link
Copy Markdown
Collaborator

ptillet commented Mar 14, 2024

Awesome! Maybe we should credit the https://research.colfax-intl.com/wp-content/uploads/2023/12/colfax-flashattention.pdf in the commit description? I see it's already in the comments of the code so no problem there :)

@ThomasRaoux
Copy link
Copy Markdown
Collaborator Author

Awesome! Maybe we should credit the https://research.colfax-intl.com/wp-content/uploads/2023/12/colfax-flashattention.pdf in the commit description? I see it's already in the comments of the code so no problem there :)

Good point, yes it should be added to the commit description as well.

@ThomasRaoux ThomasRaoux merged commit ee7e5bb into triton-lang:main Mar 14, 2024
ThomasRaoux pushed a commit that referenced this pull request Mar 14, 2024
htyu pushed a commit to htyu/triton that referenced this pull request Mar 20, 2024
…operand (triton-lang#3370)

Implement layout conversion invented by Ganesh Bikshandi and Jay Shah
described in the following paper:

https://research.colfax-intl.com/wp-content/uploads/2023/12/colfax-flashattention.pdf

This will allow us generating full fp8 attention kernel by having chain
of dots in fp8 without going through shared memory.
htyu pushed a commit to htyu/triton that referenced this pull request Mar 20, 2024
karupayun pushed a commit to openxla/triton that referenced this pull request Apr 3, 2024
…operand (triton-lang#3370)

Implement layout conversion invented by Ganesh Bikshandi and Jay Shah
described in the following paper:

https://research.colfax-intl.com/wp-content/uploads/2023/12/colfax-flashattention.pdf

This will allow us generating full fp8 attention kernel by having chain
of dots in fp8 without going through shared memory.
karupayun pushed a commit to openxla/triton that referenced this pull request Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants