Merged
Conversation
In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute.
shcho1118
pushed a commit
to shcho1118/flash-attention
that referenced
this pull request
Apr 22, 2025
Don't use FusedDense anymore to simplify code Fix FA3 qkvpacked interface Launch more thread blocks in layer_norm_bwd check valid tile before storing num_splits in split_idx (Dao-AILab#1578) Tune rotary kernel to use 2 warps if rotary_dim <= 64 Implement attention_chunk Fix missed attention chunk size param for block specifics in `mma_pv`. (Dao-AILab#1582) [AMD ROCm] Support MI350 (Dao-AILab#1586) * enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Make attention_chunk work for non-causal cases Use tile size 128 x 96 for hdim 64,256 Fix kvcache tests for attention_chunk when precomputing metadata Fix kvcache test with precomputed metadata: pass in max_seqlen_q Pass 0 as attention_chunk in the bwd for now [LayerNorm] Implement option for zero-centered weight Make hopper build more robust (Dao-AILab#1598) In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute. Fix L2 swizzle in causal tile scheduler Use LPT scheduler for causal backward pass
shcho1118
pushed a commit
to shcho1118/flash-attention
that referenced
this pull request
Apr 22, 2025
Don't use FusedDense anymore to simplify code Fix FA3 qkvpacked interface Launch more thread blocks in layer_norm_bwd check valid tile before storing num_splits in split_idx (Dao-AILab#1578) Tune rotary kernel to use 2 warps if rotary_dim <= 64 Implement attention_chunk Fix missed attention chunk size param for block specifics in `mma_pv`. (Dao-AILab#1582) [AMD ROCm] Support MI350 (Dao-AILab#1586) * enable gfx950 support * update ck for gfx950 --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Make attention_chunk work for non-causal cases Use tile size 128 x 96 for hdim 64,256 Fix kvcache tests for attention_chunk when precomputing metadata Fix kvcache test with precomputed metadata: pass in max_seqlen_q Pass 0 as attention_chunk in the bwd for now [LayerNorm] Implement option for zero-centered weight Make hopper build more robust (Dao-AILab#1598) In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute. Fix L2 swizzle in causal tile scheduler Use LPT scheduler for causal backward pass
playerzer0x
pushed a commit
to Liqhtworks/flash-attention
that referenced
this pull request
Jul 24, 2025
In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute.
elewarr
pushed a commit
to elewarr/flash-attention
that referenced
this pull request
Feb 4, 2026
In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In certain environments the relative path to the vendored nvcc is not picked up correctly if provided relative. In this PR, I just make it absolute.