[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 by JartX · Pull Request #35923 · vllm-project/vllm

JartX · 2026-03-03T22:11:10Z

This PR adds multiple of 16 to the list of supported kernel block sizes in RocmAttentionBackend

When running Qwen3.5 models using the ROCM_ATTN backend, the model produces broken, nonsensical outputs (e.g., repeating exclamation marks like !!!!!!!!!!). This happens because Qwen3.5 utilizes a non-standard block size of 1056. Since this size was not explicitly permitted, the model failed to correctly route the value_cache through the optimized Triton kernel fallback (triton_reshape_and_cache_flash).

Signed-off-by: JartX <sagformas@epdcenter.es>

gemini-code-assist

Code Review

This pull request correctly adds support for the Qwen3.5 model on ROCm by including its non-standard block size of 1056 in the RocmAttentionBackend. This change is a simple and effective fix for the reported issue, allowing the model to use the appropriate Triton kernel fallback. The implementation is correct and I have no further suggestions for improvement.

Signed-off-by: JartX <sagformas@epdcenter.es>

mergify · 2026-03-03T22:38:01Z

Hi @JartX, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: JartX <sagformas@epdcenter.es>

mergify · 2026-03-03T22:41:05Z

Documentation preview: https://vllm--35923.org.readthedocs.build/en/35923/

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX · 2026-03-05T17:45:33Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a bug where Qwen3.5 models produced incorrect outputs on the ROCm backend. The fix correctly identifies that the non-standard block size was the issue and generalizes the supported block sizes for the ROCM_ATTN backend to any multiple of 16. This is a good change that improves robustness for future models. The corresponding documentation has also been updated. However, I've found a critical issue related to this change that could cause failures for other models.

gemini-code-assist · 2026-03-05T17:47:38Z

vllm/v1/attention/backends/rocm_attn.py

    def get_supported_kernel_block_sizes() -> list[int | MultipleOf]:
-        # ROCM paged attention kernel only supports block sizes 16 and 32
+        # ROCM paged attention native C++ kernel only supports block sizes 16 and 32
        # due to shared memory (LDS) constraints on AMD GPUs.
        # See csrc/rocm/attention.cu CALL_CUSTOM_LAUNCHER_BLK macro.
-
-        # However, The limitations in [16, 32] are reasonable for a native C++ kernel,
-        # but vLLM should allow support for non-standard sizes via the Triton path,
-        # as addressed in this PR: https://github.com/vllm-project/vllm/pull/31380,
-        # where the Triton kernel under rocm_atten does not support inference
-        # for a non-standard qwen3-next model with a block_size of 544.
-        # We have fixed the Triton kernel so that the standard model uses the original
-        # bit-addressing logic, while the non-standard model
-        # uses our optimized kernel logic.
-        return [16, 32, 544]
+        # However, vLLM allows support for any multiple of 16 via the Triton path.
+        # As addressed in PR: https://github.com/vllm-project/vllm/pull/31380,
+        # non-standard models (like qwen3-next with block_size 544, or qwen3_5
+        # with 784 and 1056) are dynamically routed to our optimized Triton kernel
+        # in `do_kv_cache_update`.
+        return [MultipleOf(16)]


While this change to allow any block size that is a multiple of 16 is correct for supporting models like Qwen3.5, it introduces a potential failure for other models.

The dispatch logic in do_kv_cache_update (lines 450-480) uses is_pow2 to decide whether to use the native C++ kernel or the Triton fallback. The native C++ kernel, as noted in the comments and confirmed in csrc/rocm/attention.cu, only supports block sizes of 16 and 32.

With this PR, a model using a block size that is a power of two but not 16 or 32 (e.g., 64) will be incorrectly routed to the native C++ kernel, which will then raise an error.

To fix this, the condition in do_kv_cache_update should be changed from if is_pow2: to if block_size in (16, 32):. This will ensure that only the explicitly supported block sizes are routed to the native kernel, and all others (including other powers of two) use the Triton fallback.

mergify · 2026-03-06T05:28:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @JartX.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

JartX · 2026-03-06T10:57:30Z

/gemini review

JartX · 2026-03-06T19:42:11Z

@tjtanaa failed no such container in AMD entrypoint

JartX · 2026-03-07T12:03:09Z

@Rohan138
47] File "/usr/local/lib/python3.12/dist-packages/vllm/platforms/rocm.py", line 481, in get_attn_backend_cls
(Worker pid=618) (Worker_TP1 pid=618) ERROR 03-07 11:59:41 [multiproc_executor.py:847] raise ValueError(
(Worker pid=618) (Worker_TP1 pid=618) ERROR 03-07 11:59:41 [multiproc_executor.py:847] ValueError: No valid attention backend found for rocm with AttentionSelectorConfig(head_size=256, dtype=torch.float16, kv_cache_dtype=auto, block_size=1056, use_mla=False, has_sink=False, use_sparse=False, use_mm_prefix=False, use_per_head_quant_scales=False, attn_type=AttentionType.DECODER). Reasons: {TRITON_ATTN: [block_size not supported]}.

JartX · 2026-03-07T12:46:55Z

@Rohan138 solved here: #36292

AndreasKaratzas · 2026-03-07T19:56:18Z

@JartX can you rebase your branch? This test group should be green as of yesterday.

Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com>

JartX · 2026-03-08T00:00:53Z

@AndreasKaratzas all test passed :)

AndreasKaratzas · 2026-03-08T00:03:18Z

That's great :) Unfortunately, even though my tag says "member" my approval won't turn your PR green (I only have read permissions 😅). I have forwarded your PR to the right channels.

JartX · 2026-03-08T00:16:32Z

@AndreasKaratzas many thanks ! Hahah :)

JartX · 2026-03-08T13:14:07Z

@tjtanaa Please check this out when you can :)

mergify · 2026-03-09T03:29:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @JartX.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…attn

tjtanaa

LGTM

Signed-off-by: JartX <sagformas@epdcenter.es>

…ention to support qwen3_5 (vllm-project#35923) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

add 1056 block_size to triton fallback

90e602f

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX requested review from gshtras and tjtanaa as code owners March 3, 2026 22:11

mergify bot added qwen Related to Qwen models rocm Related to AMD ROCm v1 bug Something isn't working labels Mar 3, 2026

github-project-automation bot added this to AMD Mar 3, 2026

github-project-automation bot moved this to Todo in AMD Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

precommit

2dbcd4b

Signed-off-by: JartX <sagformas@epdcenter.es>

precommit doc

bd4b501

Signed-off-by: JartX <sagformas@epdcenter.es>

mergify bot added the documentation Improvements or additions to documentation label Mar 3, 2026

Clarify comment on non-standard model block sizes

6212617

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX force-pushed the bugfix/qwen35_rocm_attn branch from c07fc35 to 6212617 Compare March 4, 2026 10:28

JartX added 2 commits March 4, 2026 12:09

qwen3.5 27b

9c161ae

Signed-off-by: JartX <sagformas@epdcenter.es>

qwen3.5 27b

5848396

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX changed the title ~~[Bugfix] Add 1056 block_size to triton fallback on rocm Attention to support qwen3_5~~ [Bugfix] Add 784,1056 block_size to triton fallback on rocm Attention to support qwen3_5 Mar 4, 2026

JartX changed the title ~~[Bugfix] Add 784,1056 block_size to triton fallback on rocm Attention to support qwen3_5~~ [Bugfix] Add 784 and 1056 block_size to triton fallback on rocm Attention to support qwen3_5 Mar 4, 2026

allow multiple of 16 via triton path

1bbc515

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX changed the title ~~[Bugfix] Add 784 and 1056 block_size to triton fallback on rocm Attention to support qwen3_5~~ [Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 6, 2026

Merge branch 'main' into bugfix/qwen35_rocm_attn

c12fe6f

mergify bot removed the needs-rebase label Mar 6, 2026

tjtanaa mentioned this pull request Mar 7, 2026

[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks #36292

Merged

control blocks

cd8be20

Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com>

JartX force-pushed the bugfix/qwen35_rocm_attn branch from 4f16fcd to cd8be20 Compare March 7, 2026 20:02

Merge branch 'main' into bugfix/qwen35_rocm_attn

bffc779

mergify bot added the needs-rebase label Mar 9, 2026

Merge remote-tracking branch 'upstream/main' into bugfix/qwen35_rocm_…

3e3f037

…attn

mergify bot removed the needs-rebase label Mar 9, 2026

JartX added 2 commits March 9, 2026 18:44

Merge branch 'main' into bugfix/qwen35_rocm_attn

c832a9a

Merge branch 'main' into bugfix/qwen35_rocm_attn

d987032

tjtanaa approved these changes Mar 10, 2026

View reviewed changes

Merge branch 'main' into bugfix/qwen35_rocm_attn

95ad792

tjtanaa enabled auto-merge (squash) March 10, 2026 13:33

remove redundant code

cc9db38

Signed-off-by: JartX <sagformas@epdcenter.es>

auto-merge was automatically disabled March 10, 2026 22:02
Head branch was pushed to by a user without write access

AndreasKaratzas mentioned this pull request Mar 10, 2026

[CI Failure][ROCm]: CrossLayer KV layout Distributed NixlConnector PD accuracy tests (4 GPUs) #35132

Closed

3 tasks

Merge branch 'main' into bugfix/qwen35_rocm_attn

eb6d6f5

DarkLight1337 enabled auto-merge (squash) March 11, 2026 04:26

DarkLight1337 merged commit a40ee48 into vllm-project:main Mar 11, 2026
53 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 11, 2026

Uh oh!

Conversation

JartX commented Mar 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

JartX commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 6, 2026

Uh oh!

JartX commented Mar 6, 2026

Uh oh!

JartX commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JartX commented Mar 7, 2026

Uh oh!

JartX commented Mar 7, 2026

Uh oh!

AndreasKaratzas commented Mar 7, 2026

Uh oh!

JartX commented Mar 8, 2026

Uh oh!

AndreasKaratzas commented Mar 8, 2026

Uh oh!

JartX commented Mar 8, 2026

Uh oh!

JartX commented Mar 8, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JartX commented Mar 3, 2026 •

edited by github-actions bot

Loading

JartX commented Mar 6, 2026 •

edited

Loading