fix(aiter): use tuned GEMM for unquantized linear with torchao compatibility guard by michaelzhang-ai · Pull Request #20889 · sgl-project/sglang

michaelzhang-ai · 2026-03-18T23:30:57Z

Motivation

AMD CI sets SGLANG_USE_AITER=1 globally for all tests. When aiter's tgemm.mm (which calls aiter.gemm_a16w16) is used for unquantized linear operations, it crashes on models quantized by torchao (e.g. int4wo-128, fp8wo) because AffineQuantizedTensor doesn't support the aiter.gemm_a16w16 dispatch:

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run
  unimplemented operator/function: func=<OpOverload(op='aiter.gemm_a16w16', overload='default')>

This was surfaced by PR #20392 (shard 10: test_torchao.py failure in CI run).

Modifications

Import tgemm from aiter.tuned_gemm when SGLANG_USE_AITER=1
Add a tgemm.mm fast path in UnquantizedLinearMethod.apply for AMD GPUs, guarded by type(layer.weight.data) is torch.Tensor to ensure it only activates on plain tensors
Torchao-quantized weights (AffineQuantizedTensor, a torch.Tensor subclass) will fail the strict type() check and correctly fall through to F.linear

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.

…ibility guard Add aiter's tuned_gemm (tgemm.mm) for unquantized linear operations on AMD HIP GPUs, guarded by a strict type check so it only activates on plain tensors. Torchao-quantized weights (AffineQuantizedTensor) fall through to F.linear, preventing NotImplementedError on aiter.gemm_a16w16.

gemini-code-assist · 2026-03-18T23:31:00Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

michaelzhang-ai requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, HaiShaw, b8zhong and ch-wan as code owners March 18, 2026 23:30

github-actions bot added the quant LLM Quantization label Mar 18, 2026

michaelzhang-ai closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aiter): use tuned GEMM for unquantized linear with torchao compatibility guard#20889

fix(aiter): use tuned GEMM for unquantized linear with torchao compatibility guard#20889
michaelzhang-ai wants to merge 1 commit intosgl-project:mainfrom
michaelzhang-ai:fix/aiter-tgemm-torchao-compat

michaelzhang-ai commented Mar 18, 2026

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michaelzhang-ai commented Mar 18, 2026

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant