[Triton] FP8 with FA v3 API by micmelesse · Pull Request #917 · ROCm/aiter

micmelesse · 2025-08-28T21:44:21Z

Motivation

Add support for fp8 in Flash Attention

Technical Details

Modify existing code so that it confirms to the flash attention v3 api. A user provides fp8 values for q, k and v and their descale values.

Test Plan

update mha tests and bench code

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

micmelesse · 2025-08-29T03:37:16Z

You can see examples on how to use the interface for fp8 by looking at the tests.

for examples of using fp8 with regular and paged attention see the tests at op_tests/triton_tests/test_mha.py .

For regular attention fp8, you will see code that looks like this

from aiter.ops.triton.mha_v3 import (
    flash_attn_func as flash_attn_func_v3,
)

# enable backward for fp8 using dequantized values
set_fp8_dequantized_backward(True)

# forward
triton_out = flash_attn_func_v3(
    q_fp8,
    k_fp8,
    v_fp8,
    softmax_scale=None,
    causal=CAUSAL,
    q_descale=q_descale,
    k_descale=k_descale,
    v_descale=v_descale,
)

# backward
triton_dq, triton_dk, triton_dv = torch.autograd.grad(
    triton_out, (q_fp8, k_fp8, v_fp8), do.clone()
)

Here is a small model trained on wikitext to test convergence.

for paged attention fp8 which is available in the inference api flash_attn_with_kvcache , you will see code that looks like this

from aiter.ops.triton.mha_v3 import (
    flash_attn_with_kvcache as flash_attn_with_kvcache_v3,
)

# forward
out_kernel = flash_attn_with_kvcache_v3(
            q_fp8,
            k_cache_fp8,
            v_cache_fp8,
            cache_seqlens=cache_seqlens,
            causal=causal,
            q_descale=q_descale,
            k_descale=k_descale,
            v_descale=v_descale,
            page_table=page_table,
        )

Fa V3 api Compress fp8 work so far pull cast out of torch function e2e fp8 stub emulate fa v3 ignore remove example clean up forward save fp8 backward ignore train artifacts just use return_attn_probs match fa behvaior save fa ref add fa_ref fix dropout bug add link optional fp8 p descale rename to v3 fa v3 clean up match backward min diff update varlen api clean up FP8_P_DESCALE update bench and test lint fix mha varlen bug remove .gitignore save lint remove skip bring back skips

dhonnappa-amd · 2025-09-18T16:26:07Z

Jenkins CI skipped: Check lint failed. Exiting the entire job...

micmelesse · 2025-09-18T22:53:05Z

I will reopen in a bit.

dhonnappa-amd · 2025-09-22T14:18:55Z

Jenkins CI skipped: Required check(s) 'ruff_black' are missing. Exiting the entire job...

dhonnappa-amd · 2025-09-22T14:24:55Z

Jenkins CI skipped: Check lint failed. Exiting the entire job...

micmelesse · 2025-09-24T01:41:26Z

Moved here, #1065

micmelesse force-pushed the micmelesse/fp8 branch from 225484a to acb06e3 Compare September 2, 2025 23:22

micmelesse force-pushed the micmelesse/fp8 branch from d11bb47 to 553df3b Compare September 2, 2025 23:30

add fa module

eba5b96

update v2 interface

e50c96e

micmelesse closed this Sep 18, 2025

micmelesse added 8 commits September 18, 2025 17:53

create mha_v3

0925c63

add old v3 path

9f31901

update fa module

33cc498

tests passing

8ff716d

sync bwd changes

bdfd8e6

lint fa module

5ba2cc9

add kvcache api and test

b221adc

fix lint

912b110

micmelesse reopened this Sep 22, 2025

fp8 works

a19245c

micmelesse closed this Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton] FP8 with FA v3 API#917

[Triton] FP8 with FA v3 API#917
micmelesse wants to merge 12 commits intomainfrom
micmelesse/fp8

micmelesse commented Aug 28, 2025

Uh oh!

micmelesse commented Aug 29, 2025 •

edited

Loading

Uh oh!

dhonnappa-amd commented Sep 18, 2025

Uh oh!

micmelesse commented Sep 18, 2025

Uh oh!

dhonnappa-amd commented Sep 22, 2025

Uh oh!

dhonnappa-amd commented Sep 22, 2025

Uh oh!

micmelesse commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

micmelesse commented Aug 28, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

micmelesse commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhonnappa-amd commented Sep 18, 2025

Uh oh!

micmelesse commented Sep 18, 2025

Uh oh!

dhonnappa-amd commented Sep 22, 2025

Uh oh!

dhonnappa-amd commented Sep 22, 2025

Uh oh!

micmelesse commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

micmelesse commented Aug 29, 2025 •

edited

Loading