Skip to content

[ROCM] Enable CompressedTensorsWNA16#27187

Merged
yewentao256 merged 2 commits intovllm-project:mainfrom
JartX:feature/enable_compressedtensorswna16_on_rocm
Oct 21, 2025
Merged

[ROCM] Enable CompressedTensorsWNA16#27187
yewentao256 merged 2 commits intovllm-project:mainfrom
JartX:feature/enable_compressedtensorswna16_on_rocm

Conversation

@JartX
Copy link
Copy Markdown
Contributor

@JartX JartX commented Oct 20, 2025

I'm currently using ROCM with RDNA3. I've been trying to use compressed-tensors for a while, and I thought it was only supported on CUDA.

This change simply avoids entering: CompressedTensorsWNA16MarlinMoEMethod for CUDA when using ROCM and allows inference of a model like the following:

jart25/Qwen3-VL-30B-A3B-Instruct-AWQ-8bit

using: CompressedTensorsWNA16MoEMethod and ExllamaLinearKernel

Note that it must be run with:

export VLLM_USE_TRITON_AWQ=1

JartX added 2 commits October 20, 2025 09:45
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: JartX <sagformas@epdcenter.es>
@mergify mergify bot added the rocm Related to AMD ROCm label Oct 20, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly enables CompressedTensorsWNA16 on ROCm platforms by preventing the use of the Marlin MoE kernel, which is not supported on ROCm. The change is simple, effective, and consistent with how other parts of the codebase handle ROCm-specific limitations for Marlin kernels. This allows models using this quantization scheme to run on ROCm, which is a valuable improvement.

Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025
@JartX
Copy link
Copy Markdown
Contributor Author

JartX commented Oct 21, 2025

hi @yewentao256 have passed all test, can merge it?

@yewentao256 yewentao256 merged commit ba09652 into vllm-project:main Oct 21, 2025
57 checks passed
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Chenyaaang pushed a commit to Chenyaaang/vllm that referenced this pull request Oct 28, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: JartX <sagformas@epdcenter.es>
@JartX JartX deleted the feature/enable_compressedtensorswna16_on_rocm branch March 15, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants