Add all remaining operator guides and complete README docs hub by sunway513 · Pull Request #4 · sunway513/aiter

sunway513 · 2026-02-07T22:10:53Z

Summary

Add 6 new operator guides covering every operator in AITER
Update README.md with a complete documentation hub linking all 9 guides
Every operator in the Supported Operators table now links to its relevant guide

New Documentation (this PR)

GEMM Variants & Tuning Guide

A8W8, A16W16, A4W4, batched, DeepGEMM, Triton FFN fusions
Complete tuning system docs (CSV format, env vars, model configs)

Quantization & Precision Guide

QuantType enum, per-tensor/token/block strategies
Fused quantization ops (FP8 + MXFP4), SmoothQuant, KV cache quant

Normalization Guide

RMSNorm, LayerNorm, GroupNorm with all fused variants
Add + SmoothQuant + Dynamic Quant fusions, backend dispatch logic
Fused QK norm + RoPE + cache + quant mega-kernels
Distributed RS + RMSNorm + Quant + AG fusion

RoPE (Rotary Position Embedding) Guide

SBHD, THD, 2D, 3D tensor formats
NeoX & GPT-J rotation styles, partial RoPE (nope_first)
8 scaling methods (Linear, NTK, YaRN, Phi-3, DeepSeek, LLaMA3, MRoPE, DualChunk)
Autograd classes for training, fused operations

KV-Cache Management Guide

Paged, flash, ASM, and MLA cache layouts
Quantized cache (FP8, INT8, FP4) with per-token and per-block scales
Fused RoPE + cache write, fused BMM + RoPE + cache
Block swap/copy for beam search and speculative decoding

Elementwise & Activation Guide

SiLU/GELU/sigmoid/tanh activations
SwiGLU/GeGLU gating (the standard LLM FFN pattern)
Fused activation + quantize (FP8 group, MXFP4 block-scale)
Binary arithmetic with broadcasting, fused mul-add

README Updates

Documentation table expanded from 5 to 9 guides
All 13 operators in the table now link to their guide

Test plan

Verify all doc links in README resolve correctly
Review each guide for accuracy against source code
Confirm formatting renders properly on GitHub

🤖 Generated with Claude Code

New operator guides covering GEMM variants (A8W8, A16W16, A4W4, batched, DeepGEMM, Triton FFN, tuning system) and Quantization strategies (QuantType, per-tensor/token/block, fused ops, FP8/MXFP4/INT4, SmoothQuant). README now features a Documentation section linking all five operator guides and the operator table links each op to its relevant guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New operator guides: - Normalization: RMSNorm, LayerNorm, GroupNorm with all fused variants (add, SmoothQuant, dynamic quant), distributed fusion, backend dispatch - RoPE: SBHD/THD/2D/3D formats, NeoX & GPT-J styles, 8 scaling methods, autograd classes, fused QK norm + RoPE + cache + quant - KV-Cache: Paged/flash/ASM/MLA layouts, quantized cache (FP8/INT8/FP4), fused RoPE + cache write, block swap/copy management - Elementwise & Activations: SiLU/GELU/sigmoid/tanh, SwiGLU gating, fused activation + quantize (FP8/MXFP4), binary arithmetic README now links all 9 operator guides and every operator in the table has a link to its relevant guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge duplicate rows (MHA+PA, RMSNorm+LayerNorm, Elementwise+Sigmoid) so each operator row maps to exactly one guide. Promote Communication into the operator table and eliminate the separate Documentation section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move operator table above the guide description note, separate test instruction into its own line with generic example, and promote Additional Resources to a proper section header. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document GPU_ARCHS, PREBUILD_KERNELS levels (0-3), and MAX_JOBS. Reorganize Installation into subsections: Development Mode (JIT), Precompiled Kernels, and Triton Communication Support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Explain what it is (multi-GPU reduce-scatter/all-gather via Iris), mark as optional, and link to the full guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sunway513 and others added 2 commits February 7, 2026 16:10

sunway513 changed the title ~~Add GEMM & Quantization guides, update README docs hub~~ Add all remaining operator guides and complete README docs hub Feb 7, 2026

sunway513 and others added 4 commits February 7, 2026 16:35

Clarify Triton communication install section in README

eee9c86

Explain what it is (multi-GPU reduce-scatter/all-gather via Iris), mark as optional, and link to the full guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sunway513 merged commit f60774b into main Feb 7, 2026
11 of 15 checks passed

sunway513 mentioned this pull request Apr 26, 2026

[DSV4 W4.3-Redo] AITER sparse_attn metadata validator (sunway513/atom#37) #60

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add all remaining operator guides and complete README docs hub#4

Add all remaining operator guides and complete README docs hub#4
sunway513 merged 6 commits into
mainfrom
docs/gemm-quant-guides-and-readme

sunway513 commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunway513 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Documentation (this PR)

GEMM Variants & Tuning Guide

Quantization & Precision Guide

Normalization Guide

RoPE (Rotary Position Embedding) Guide

KV-Cache Management Guide

Elementwise & Activation Guide

README Updates

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sunway513 commented Feb 7, 2026 •

edited

Loading