Add attention variants and backend guide by sunway513 · Pull Request #2 · sunway513/aiter

sunway513 · 2026-02-07T19:40:13Z

Summary

Add comprehensive user-facing documentation for all attention variants in AITER
Cover MHA (Flash Attention), Paged Attention (decode + prefill), MLA (Multi-head Latent Attention), Unified Attention, and specialized variants (Lean, HSTU, Sparse, Chunked)
Include backend support matrices (ASM vs CK vs Triton), data type coverage, KV cache quantization options, and fused operation catalog

Highlights

Quick reference table helping users pick the right attention variant for their use case
Decision tree for backend selection (training vs inference, model type, GPU arch)
Data type matrices per variant and backend (BF16, FP16, FP8, INT8)
KV cache quantization guide with precision levels and memory savings
Practical API examples for MHA, PA decode/prefill, and MLA
GPU architecture support summary (MI300X vs MI350 vs other)

Test plan

Review report accuracy against current source code
Verify all referenced API functions and source files exist

🤖 Generated with Claude Code

Document all attention variants (MHA, PA, MLA, Unified, Sparse, etc.) with backend support matrices, data type coverage, decision trees for choosing the right variant, and practical API examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply spatial stream-K style work allocation to leanAttention.

Wrapper-level safety guard for the padded-softmax bug raised by Copilot inline comment #2 on PR ROCm#2969. Padded K/V tokens produce QK^T = 0 but exp(0) = 1 still contributes to the softmax denominator and silently scales the output for non-causal attention. Causal mode masks padded positions so it is unaffected. Empirical RCA at aiter-forge-baselines/2969_padded_softmax_rca.md: - Wan2.1 production (S_real=32760, S_pad=32768, ratio=0.024%): cos_min 0.999992, max_abs 0.0008 — safe, indistinguishable from bf16 noise floor. - 50% padding worst case: rel_err 37.3%, max_abs 0.281 — silent output scaling, would corrupt downstream. Implements option (d) from the RCA decision doc (signed off by Peng): hybrid threshold. Non-causal calls with n_pad/seq_len_pad > 0.005 are rejected with a ValueError that points the caller at the three valid remediations (causal=True, pre-pad to multiple of 128, or use a masking-aware kernel). Threshold rationale: 0.5% is the bf16 mantissa precision floor (~0.4%, 7 mantissa bits) plus 1 bit of margin. Production Wan2.1 (0.024%) clears it by 20x, so the hot path stays open while the silent-disaster worst case is closed. Tests added (op_tests/flydsl_tests/test_flydsl_fmha.py): - test_flydsl_fmha_rejects_excessive_padding: B=1, S_real=129 (S_pad=256, 49.6% pad), causal=False — must raise ValueError with "0.5% safety threshold" substring. - test_flydsl_fmha_allows_tight_padding: Wan2.1 case S_real=32760, causal=False — must succeed and match SDPA reference (cos_min >= 0.9999). Regression guard for the production hot path. Validation on R9600D (gfx1201) inside wan-best container, HIP_VISIBLE_DEVICES=4: 10 passed, 2 skipped (multi-GPU only). black --check + ruff check both clean on touched files. Kernel file aiter/ops/flydsl/kernels/flash_attn_func_gfx1201.py is intentionally untouched — refactor is in a parallel branch.

sunway513 merged commit f175623 into main Feb 7, 2026

sunway513 deleted the docs/attention-variants-guide branch February 22, 2026 03:52

sunway513 restored the docs/attention-variants-guide branch February 22, 2026 03:54

sunway513 mentioned this pull request Mar 9, 2026

Documentation Websites for OSS Projects #53

Closed

sunway513 pushed a commit that referenced this pull request Mar 22, 2026

LeanAttention spatial optimization #2 (ROCm#853)

85ebcd5

Apply spatial stream-K style work allocation to leanAttention.

sunway513 mentioned this pull request Apr 26, 2026

[DSV4 W4.3-Redo] AITER sparse_attn metadata validator (sunway513/atom#37) #60

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add attention variants and backend guide#2

Add attention variants and backend guide#2
sunway513 merged 1 commit into
mainfrom
docs/attention-variants-guide

sunway513 commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunway513 commented Feb 7, 2026

Summary

Highlights

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant