Skip to content

[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests#2340

Merged
gcanlin merged 27 commits intovllm-project:mainfrom
EmbeddedLLM:fix-ci20260330
Apr 11, 2026
Merged

[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests#2340
gcanlin merged 27 commits intovllm-project:mainfrom
EmbeddedLLM:fix-ci20260330

Conversation

@tjtanaa
Copy link
Copy Markdown
Contributor

@tjtanaa tjtanaa commented Mar 30, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR addressed two issues:

  1. After updating the pytorch version, it seems there are issues with pytest and host system, the pytest signal was not capture and surfaced. So, the CI status does not reflect failures. All appears green. [ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 vllm#38252 (comment)
  2. Since vLLM v0.19.0, the default attention backend is ROCM_ATTN for ROCm. However, the compatibility of ROCM_ATTN with Omni is not guaranteed. Therefore, we still use TRITON_ATTN as the default attention backend, when the selected_backend is not specified.
  3. Sync up the test cases on ROCm with Cuda Test cases. Some tests are split into multiple sub-tests to keep each test job under 30mins.

Disable "Bagel Tests" for further investigation.

Test Plan

Evaluated new test groups locally for newly added test groups: Voxtral-TTS, Qwen3-TTS, MIMO, CosyVoice3-TTS E2E Test

Test Result

Local test results

pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
================================== 5 passed, 1 deselected, 2 warnings in 187.03s (0:03:07) ==================================

pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
2 passed, 2 warnings in 123.21s (0:02:03)

pytest -s -v tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py
27 passed, 2 warnings in 6.57s

CUDA_VISIBLE_DEVICES=0 pytest -s -v tests/e2e/offline_inference/test_t2i_model.py -m
 "core_model and diffusion" --run-level "core_model" -k "rocm"
1 passed, 1 skipped, 3 warnings in 192.47s (0:03:12)

pytest -s -v tests/e2e/offline_inference/test_qwen_image_diffusion_batching.py -m "core_model and diffusion" --run-level "core_model"
 6 passed, 3 warnings in 349.40s (0:05:49)
 
 pytest -s -v tests/e2e/online_serving/test_mimo_audio.py -m "core_model" --run-level "core_model"
1 passed, 1 deselected, 3 warnings in 601.87s (0:10:01)

pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "core_model" --run-level "core_model"
================== 3 passed, 16 warnings in 549.13s (0:09:09) 

CI

test-amd-ready.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5042/steps/canvas (all green)

test-amd-merge.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5040/steps/canvas (all green)


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

tjtanaa added 3 commits March 30, 2026 16:24
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa marked this pull request as ready for review March 31, 2026 03:13
@tjtanaa tjtanaa requested a review from hsliuustc0106 as a code owner March 31, 2026 03:13
@tjtanaa tjtanaa marked this pull request as draft March 31, 2026 03:15
tjtanaa added 5 commits March 31, 2026 04:36
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa marked this pull request as ready for review March 31, 2026 12:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f050fb91a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +172 to +173
pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve failing exit codes in Voxtral ready CI step

This block runs two pytest commands inside bash -c without set -e or &&, so the step exits with only the second command’s status. If test_voxtral_tts.py (online) fails but test_voxtral_tts.py (offline) passes, the Buildkite step reports success and hides a real regression.

Useful? React with 👍 / 👎.

Comment thread .buildkite/test-amd-merge.yml Outdated
Comment on lines +251 to +252
pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve failing exit codes in Voxtral merge CI step

Like the ready pipeline, this bash -c script runs two tests sequentially without fail-fast behavior. In bash, the script returns the last command’s exit code, so a failure in the online Voxtral test can be masked by a passing offline test and incorrectly mark the merge pipeline green.

Useful? React with 👍 / 👎.

@yenuo26
Copy link
Copy Markdown
Collaborator

yenuo26 commented Mar 31, 2026

I resolved the issue of increased runtime for the omni model in #2354. I think this may help alleviate the timeout errors in the amd-merge-ci.

@tjtanaa
Copy link
Copy Markdown
Contributor Author

tjtanaa commented Apr 1, 2026

I resolved the issue of increased runtime for the omni model in #2354. I think this may help alleviate the timeout errors in the amd-merge-ci.

Thanks. I saw that the CI is passing https://buildkite.com/vllm/vllm-omni-amd-ci/builds/4419/steps/canvas

tjtanaa added 12 commits April 8, 2026 09:05
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa changed the title [ROCm] [CI] Sync with cuda test cases and fixed test case timeout issue [ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests Apr 10, 2026
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa marked this pull request as draft April 10, 2026 10:31
tjtanaa added 5 commits April 10, 2026 10:42
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
@tjtanaa tjtanaa marked this pull request as ready for review April 10, 2026 16:03
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 11, 2026
NOTE: AR Attention Backend Overriding Logic:
------------------------------------------
Since vLLM v0.19.0, the default attention backend is ROCM_ATTN for ROCm.
However, the compatibility of ROCM_ATTN with Omni is not guaranteed.
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why ROCM_ATTN is not compatible with vllm-omni? If not, how could Qwen3-Omni/Qwen2.5-Omni thinker work in vllm? Currently, vllm-omni should only reuse the vllm attention for these two models AR parts mainly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just for some environment issues?

Copy link
Copy Markdown
Contributor Author

@tjtanaa tjtanaa Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On upstream when they update the default behavior, these two models are not evaluated explicitly. And most unit test restrictions model_len. The attention backend will encounter issues when context length of the model is too large.

Copy link
Copy Markdown
Contributor Author

@tjtanaa tjtanaa Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general ROCM_ATTN is not as generalized as TRITON_ATTN. The current attention backend fallback condition is not broadly tested, so I would prefer if we can still keep the use of TRITON_ATTN for vllm omni models first else the model crashes and can't even launch.

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for the late review

# However, the compatibility of ROCM_ATTN with Omni is not guaranteed.
# Therefore, we still use TRITON_ATTN as the default attention backend,
# when the selected_backend is not specified.
engine_args["attention_backend"] = "TRITON_ATTN"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a log to remind users here?

@tjtanaa
Copy link
Copy Markdown
Contributor Author

tjtanaa commented Apr 11, 2026

LGTM. Sorry for the late review

Thank you. And dont need to apologize. There are many PRs. Thank you for your hardworking in reviewing all these PR. Appreciate that. :)

@gcanlin gcanlin merged commit 25c0566 into vllm-project:main Apr 11, 2026
8 checks passed
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
…with cuda tests (vllm-project#2340)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants