[AMD ROCm] Support gfx950 by rocking5566 · Pull Request #1586 · Dao-AILab/flash-attention

rocking5566 · 2025-04-11T17:28:10Z

I update the AMD backend (composable_kernel) to support gfx950
Unlock gfx950 in setup.py

…ktile/gfx950

Don't use FusedDense anymore to simplify code Fix FA3 qkvpacked interface Launch more thread blocks in layer_norm_bwd check valid tile before storing num_splits in split_idx (Dao-AILab#1578) Tune rotary kernel to use 2 warps if rotary_dim <= 64 Implement attention_chunk Fix missed attention chunk size param for block specifics in `mma_pv`. (Dao-AILab#1582) [AMD ROCm] Support MI350 (Dao-AILab#1586) * enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Make attention_chunk work for non-causal cases Use tile size 128 x 96 for hdim 64,256 Fix kvcache tests for attention_chunk when precomputing metadata Fix kvcache test with precomputed metadata: pass in max_seqlen_q Pass 0 as attention_chunk in the bwd for now [LayerNorm] Implement option for zero-centered weight Make hopper build more robust (Dao-AILab#1598) In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute. Fix L2 swizzle in causal tile scheduler Use LPT scheduler for causal backward pass

* enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com>

illsilin and others added 3 commits April 8, 2025 08:28

enable gfx950 support

2784db0

update ck for gfx950

fd56098

Merge commit '2f9ef0879a0935c3ca852f7a6a7b7a9c24f41e96' into v2.7.3-c…

b3c68b1

…ktile/gfx950

tridao merged commit c1352b6 into Dao-AILab:main Apr 11, 2025

rocking5566 changed the title ~~[AMD ROCm] Support MI350~~ [AMD ROCm] Support gfx950 Apr 12, 2025

playerzer0x pushed a commit to Liqhtworks/flash-attention that referenced this pull request Jul 24, 2025

[AMD ROCm] Support MI350 (Dao-AILab#1586)

486f325

* enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com>

elewarr pushed a commit to elewarr/flash-attention that referenced this pull request Feb 4, 2026

[AMD ROCm] Support MI350 (Dao-AILab#1586)

006a3b8

* enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD ROCm] Support gfx950#1586

[AMD ROCm] Support gfx950#1586
tridao merged 3 commits intoDao-AILab:mainfrom
ROCm:v2.7.3-cktile/gfx950

rocking5566 commented Apr 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rocking5566 commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rocking5566 commented Apr 11, 2025 •

edited

Loading