[ROCM] Enable CompressedTensorsWNA16 by JartX · Pull Request #27187 · vllm-project/vllm

JartX · 2025-10-20T07:50:47Z

I'm currently using ROCM with RDNA3. I've been trying to use compressed-tensors for a while, and I thought it was only supported on CUDA.

This change simply avoids entering: CompressedTensorsWNA16MarlinMoEMethod for CUDA when using ROCM and allows inference of a model like the following:

jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-8bit

using: CompressedTensorsWNA16MoEMethod and ExllamaLinearKernel

Note that it must be run with:

export VLLM_USE_TRITON_AWQ=1

Signed-off-by: JartX <sagformas@epdcenter.es>

gemini-code-assist

Code Review

This pull request correctly enables CompressedTensorsWNA16 on ROCm platforms by preventing the use of the Marlin MoE kernel, which is not supported on ROCm. The change is simple, effective, and consistent with how other parts of the codebase handle ROCm-specific limitations for Marlin kernels. This allows models using this quantization scheme to run on ROCm, which is a valuable improvement.

yewentao256

LGTM, thanks for the work!

JartX · 2025-10-21T10:35:17Z

hi @yewentao256 have passed all test, can merge it?

Signed-off-by: JartX <sagformas@epdcenter.es>

Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX added 2 commits October 20, 2025 09:45

enable CompressedTensorsWNA16 on rocm

b5c3e69

Signed-off-by: JartX <sagformas@epdcenter.es>

pre-commit

6e5521f

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 20, 2025 07:50

mergify bot added the rocm Related to AMD ROCm label Oct 20, 2025

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

JartX mentioned this pull request Oct 20, 2025

[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA #27190

Merged

yewentao256 approved these changes Oct 20, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025

JartX mentioned this pull request Oct 20, 2025

[AWQ][Qwen3 VL] Add qwen3-vl-30b-a3b-Instruct-example vllm-project/llm-compressor#1947

Merged

yewentao256 merged commit ba09652 into vllm-project:main Oct 21, 2025
57 checks passed

Kay-Tian mentioned this pull request Oct 23, 2025

vLLM PR #27187 变更核心文件提醒 Kay-Tian/vllm#16

Closed

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

fd47bc9

Signed-off-by: JartX <sagformas@epdcenter.es>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

ba7e59a

Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

5ccf419

Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Chenyaaang pushed a commit to Chenyaaang/vllm that referenced this pull request Oct 28, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

46fcb1a

Signed-off-by: JartX <sagformas@epdcenter.es>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

28249cc

Signed-off-by: JartX <sagformas@epdcenter.es>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

512e93d

Signed-off-by: JartX <sagformas@epdcenter.es>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[ROCM] Enable CompressedTensorsWNA16 (vllm-project#27187)

521569f

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX deleted the feature/enable_compressedtensorswna16_on_rocm branch March 15, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCM] Enable CompressedTensorsWNA16#27187

[ROCM] Enable CompressedTensorsWNA16#27187
yewentao256 merged 2 commits intovllm-project:mainfrom
JartX:feature/enable_compressedtensorswna16_on_rocm

JartX commented Oct 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

JartX commented Oct 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JartX commented Oct 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

JartX commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JartX commented Oct 20, 2025 •

edited by github-actions bot

Loading

JartX commented Oct 21, 2025 •

edited

Loading