[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests by tjtanaa · Pull Request #2340 · vllm-project/vllm-omni

tjtanaa · 2026-03-30T16:29:06Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR addressed two issues:

After updating the pytorch version, it seems there are issues with pytest and host system, the pytest signal was not capture and surfaced. So, the CI status does not reflect failures. All appears green. [ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 vllm#38252 (comment)
Since vLLM v0.19.0, the default attention backend is ROCM_ATTN for ROCm. However, the compatibility of ROCM_ATTN with Omni is not guaranteed. Therefore, we still use TRITON_ATTN as the default attention backend, when the selected_backend is not specified.
Sync up the test cases on ROCm with Cuda Test cases. Some tests are split into multiple sub-tests to keep each test job under 30mins.

Disable "Bagel Tests" for further investigation.

Test Plan

Evaluated new test groups locally for newly added test groups: Voxtral-TTS, Qwen3-TTS, MIMO, CosyVoice3-TTS E2E Test

Test Result

Local test results

pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
================================== 5 passed, 1 deselected, 2 warnings in 187.03s (0:03:07) ==================================

pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
2 passed, 2 warnings in 123.21s (0:02:03)

pytest -s -v tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py
27 passed, 2 warnings in 6.57s

CUDA_VISIBLE_DEVICES=0 pytest -s -v tests/e2e/offline_inference/test_t2i_model.py -m
 "core_model and diffusion" --run-level "core_model" -k "rocm"
1 passed, 1 skipped, 3 warnings in 192.47s (0:03:12)

pytest -s -v tests/e2e/offline_inference/test_qwen_image_diffusion_batching.py -m "core_model and diffusion" --run-level "core_model"
 6 passed, 3 warnings in 349.40s (0:05:49)
 
 pytest -s -v tests/e2e/online_serving/test_mimo_audio.py -m "core_model" --run-level "core_model"
1 passed, 1 deselected, 3 warnings in 601.87s (0:10:01)

pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "core_model" --run-level "core_model"
================== 3 passed, 16 warnings in 549.13s (0:09:09)

CI

test-amd-ready.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5042/steps/canvas (all green)

test-amd-merge.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5040/steps/canvas (all green)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f050fb91a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T13:05:14Z

+        pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
+        pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"


Preserve failing exit codes in Voxtral ready CI step

This block runs two pytest commands inside bash -c without set -e or &&, so the step exits with only the second command’s status. If test_voxtral_tts.py (online) fails but test_voxtral_tts.py (offline) passes, the Buildkite step reports success and hides a real regression.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T13:05:14Z

+        pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
+        pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"


Preserve failing exit codes in Voxtral merge CI step

Like the ready pipeline, this bash -c script runs two tests sequentially without fail-fast behavior. In bash, the script returns the last command’s exit code, so a failure in the online Voxtral test can be masked by a passing offline test and incorrectly mark the merge pipeline green.

Useful? React with 👍 / 👎.

yenuo26 · 2026-03-31T13:27:29Z

I resolved the issue of increased runtime for the omni model in #2354. I think this may help alleviate the timeout errors in the amd-merge-ci.

tjtanaa · 2026-04-01T08:31:21Z

I resolved the issue of increased runtime for the omni model in #2354. I think this may help alleviate the timeout errors in the amd-merge-ci.

Thanks. I saw that the CI is passing https://buildkite.com/vllm/vllm-omni-amd-ci/builds/4419/steps/canvas

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

chatgpt-codex-connector · 2026-04-10T16:03:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gcanlin · 2026-04-11T07:33:54Z

+    NOTE: AR Attention Backend Overriding Logic:
+    ------------------------------------------
+    Since vLLM v0.19.0, the default attention backend is ROCM_ATTN for ROCm.
+    However, the compatibility of ROCM_ATTN with Omni is not guaranteed.


Could you please explain why ROCM_ATTN is not compatible with vllm-omni? If not, how could Qwen3-Omni/Qwen2.5-Omni thinker work in vllm? Currently, vllm-omni should only reuse the vllm attention for these two models AR parts mainly.

Or just for some environment issues?

On upstream when they update the default behavior, these two models are not evaluated explicitly. And most unit test restrictions model_len. The attention backend will encounter issues when context length of the model is too large.

In general ROCM_ATTN is not as generalized as TRITON_ATTN. The current attention backend fallback condition is not broadly tested, so I would prefer if we can still keep the use of TRITON_ATTN for vllm omni models first else the model crashes and can't even launch.

gcanlin

LGTM. Sorry for the late review

gcanlin · 2026-04-11T07:48:04Z

+            # However, the compatibility of ROCM_ATTN with Omni is not guaranteed.
+            # Therefore, we still use TRITON_ATTN as the default attention backend,
+            # when the selected_backend is not specified.
+            engine_args["attention_backend"] = "TRITON_ATTN"


Should we add a log to remind users here?

tjtanaa · 2026-04-11T07:48:52Z

LGTM. Sorry for the late review

Thank you. And dont need to apologize. There are many PRs. Thank you for your hardworking in reviewing all these PR. Appreciate that. :)

…with cuda tests (vllm-project#2340) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa added 3 commits March 30, 2026 16:24

sync with cuda test and bugfix timeout

a698d51

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

qwen3 tts should use onlye 1 GPU

f53c024

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

following CUDA, disable qwen3 omni test in ready pipeline

abb2be2

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa marked this pull request as ready for review March 31, 2026 03:13

tjtanaa requested a review from hsliuustc0106 as a code owner March 31, 2026 03:13

Merge remote-tracking branch 'origin/main' into fix-ci20260330

4c7548c

tjtanaa marked this pull request as draft March 31, 2026 03:15

tjtanaa added 5 commits March 31, 2026 04:36

increate qwen3 tts unit test time out

fd9ed72

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

try to trigger test-amd-merge.yaml

c5872bd

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

increase qwen3 tts timeout

22e2483

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove Qwen3-TTS Base E2E Test (1/2) for further investigation

ee4d0cd

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Merge branch 'main' into fix-ci20260330

f050fb9

tjtanaa marked this pull request as ready for review March 31, 2026 12:59

chatgpt-codex-connector Bot reviewed Mar 31, 2026

View reviewed changes

xiaohajiayou mentioned this pull request Mar 31, 2026

[Bugfix] Fix precedence between caller runtime args and default stage configs #2076

Merged

tjtanaa added 12 commits April 8, 2026 09:05

Merge remote-tracking branch 'origin/main' into fix-ci20260330

1bd718e

move some of the test to mi250

06d4b0c

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

move some of the test to mi250

0f144d3

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

return simple unit test to mi325 to speed up the test

27fe15b

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

update the merge.yml to use some mi250 resources

ea1586f

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

prepare trial run amd merge

83e0c3e

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix dockerfile.rocm test status issue

a424246

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Merge remote-tracking branch 'origin/main' into fix-ci20260330

4597574

fix ROCm AR attention backend to pick TRITON_ATTN

f86b089

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove comments

c6b4e94

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

change back to mi325, it is unstable

aaa2f25

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

disable bagel test

f8115b8

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa changed the title ~~[ROCm] [CI] Sync with cuda test cases and fixed test case timeout issue~~ [ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests Apr 10, 2026

re-enable ready test

7dff7bf

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa marked this pull request as draft April 10, 2026 10:31

tjtanaa added 5 commits April 10, 2026 10:42

fix test

6115fdc

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

prepare to run on merge

462bbd8

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

enable cosy ttest

00b6a5f

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

let's not use mi250

62f6a14

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

now go and test ready

34ea773

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa marked this pull request as ready for review April 10, 2026 16:03

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 11, 2026

gcanlin reviewed Apr 11, 2026

View reviewed changes

gcanlin approved these changes Apr 11, 2026

View reviewed changes

gcanlin reviewed Apr 11, 2026

View reviewed changes

gcanlin merged commit 25c0566 into vllm-project:main Apr 11, 2026
8 checks passed

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026

[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync …

b5c86be

…with cuda tests (vllm-project#2340) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

		pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
		pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"

Conversation

tjtanaa commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Local test results

CI

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 commented Mar 31, 2026

Uh oh!

tjtanaa commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 10, 2026

Uh oh!

gcanlin Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tjtanaa commented Mar 30, 2026 •

edited

Loading

gcanlin Apr 11, 2026 •

edited

Loading

tjtanaa Apr 11, 2026 •

edited

Loading

tjtanaa Apr 11, 2026 •

edited

Loading