Skip to content

fix(aiter): use tuned GEMM for unquantized linear with torchao compatibility guard#20889

Closed
michaelzhang-ai wants to merge 1 commit intosgl-project:mainfrom
michaelzhang-ai:fix/aiter-tgemm-torchao-compat
Closed

fix(aiter): use tuned GEMM for unquantized linear with torchao compatibility guard#20889
michaelzhang-ai wants to merge 1 commit intosgl-project:mainfrom
michaelzhang-ai:fix/aiter-tgemm-torchao-compat

Conversation

@michaelzhang-ai
Copy link
Copy Markdown
Collaborator

Motivation

AMD CI sets SGLANG_USE_AITER=1 globally for all tests. When aiter's tgemm.mm (which calls aiter.gemm_a16w16) is used for unquantized linear operations, it crashes on models quantized by torchao (e.g. int4wo-128, fp8wo) because AffineQuantizedTensor doesn't support the aiter.gemm_a16w16 dispatch:

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run
  unimplemented operator/function: func=<OpOverload(op='aiter.gemm_a16w16', overload='default')>

This was surfaced by PR #20392 (shard 10: test_torchao.py failure in CI run).

Modifications

  • Import tgemm from aiter.tuned_gemm when SGLANG_USE_AITER=1
  • Add a tgemm.mm fast path in UnquantizedLinearMethod.apply for AMD GPUs, guarded by type(layer.weight.data) is torch.Tensor to ensure it only activates on plain tensors
  • Torchao-quantized weights (AffineQuantizedTensor, a torch.Tensor subclass) will fail the strict type() check and correctly fall through to F.linear

Checklist

…ibility guard

Add aiter's tuned_gemm (tgemm.mm) for unquantized linear operations on
AMD HIP GPUs, guarded by a strict type check so it only activates on
plain tensors. Torchao-quantized weights (AffineQuantizedTensor) fall
through to F.linear, preventing NotImplementedError on aiter.gemm_a16w16.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the quant LLM Quantization label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quant LLM Quantization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant