Add MLA variants and backend guide by sunway513 · Pull Request #1 · sunway513/aiter

sunway513 · 2026-02-07T17:36:40Z

Summary

Add comprehensive user-facing documentation for all MLA (Multi-head Latent Attention) variants in AITER
Cover Standard Decode, Persistent Decode, Standard Prefill, Persistent Prefill, Sparse MLA (Top-K), and Fused Operations (BMM+RoPE+Cache)
Include backend support matrices (ASM vs Triton), data type coverage, GQA ratio support, KV cache layouts, and RoPE handling

Highlights

Quick reference table helping users pick the right MLA variant for their use case
Decision tree for backend selection (prefill vs decode, persistent vs standard, sparse vs dense)
Data type matrices per variant and backend (BF16, FP8, FP4/MXFP4)
GQA ratio support table including ASM persistent mode's simulated ratios (32-112)
KV cache layout guide covering standard and 3-buffer FP8 layouts
Practical API examples for decode, persistent decode, prefill, sparse MLA, and fused cache operations
GPU architecture support summary (MI300X vs MI350 vs portable Triton)
Performance tuning guide with split-K auto-tuning details and key parameters

Test plan

Review report accuracy against current source code
Verify all referenced API functions, source files, and kernel configurations exist

🤖 Generated with Claude Code

Document the current state of MLA kernel support across Triton and ASM backends, covering precision, fusion levels, execution modes, GQA support, KV cache layouts, and recommended areas for future development. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure the MLA documentation to match the format of the attention and MOE guides: quick reference table, per-variant sections with backend matrices, practical API examples, decision tree, data type and GQA matrices, fused operations catalog, GPU architecture summary, and full source/test file references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sunway513 and others added 2 commits February 7, 2026 11:26

sunway513 changed the title ~~Add MLA kernel support report: Triton vs ASM comparison~~ Add MLA variants and backend guide Feb 7, 2026

sunway513 merged commit f7908d6 into main Feb 7, 2026
11 of 15 checks passed

sunway513 mentioned this pull request Mar 9, 2026

Documentation Websites for OSS Projects #53

Closed

sunway513 mentioned this pull request Apr 26, 2026

[DSV4 W4.3-Redo] AITER sparse_attn metadata validator (sunway513/atom#37) #60

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLA variants and backend guide#1

Add MLA variants and backend guide#1
sunway513 merged 2 commits into
mainfrom
docs/mla-kernel-support-report

sunway513 commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunway513 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Highlights

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sunway513 commented Feb 7, 2026 •

edited

Loading