[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync with cuda tests#2340
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f050fb91a0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model" | ||
| pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model" |
There was a problem hiding this comment.
Preserve failing exit codes in Voxtral ready CI step
This block runs two pytest commands inside bash -c without set -e or &&, so the step exits with only the second command’s status. If test_voxtral_tts.py (online) fails but test_voxtral_tts.py (offline) passes, the Buildkite step reports success and hides a real regression.
Useful? React with 👍 / 👎.
| pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model" | ||
| pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model" |
There was a problem hiding this comment.
Preserve failing exit codes in Voxtral merge CI step
Like the ready pipeline, this bash -c script runs two tests sequentially without fail-fast behavior. In bash, the script returns the last command’s exit code, so a failure in the online Voxtral test can be masked by a passing offline test and incorrectly mark the merge pipeline green.
Useful? React with 👍 / 👎.
|
I resolved the issue of increased runtime for the omni model in #2354. I think this may help alleviate the timeout errors in the amd-merge-ci. |
Thanks. I saw that the CI is passing https://buildkite.com/vllm/vllm-omni-amd-ci/builds/4419/steps/canvas |
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
| NOTE: AR Attention Backend Overriding Logic: | ||
| ------------------------------------------ | ||
| Since vLLM v0.19.0, the default attention backend is ROCM_ATTN for ROCm. | ||
| However, the compatibility of ROCM_ATTN with Omni is not guaranteed. |
There was a problem hiding this comment.
Could you please explain why ROCM_ATTN is not compatible with vllm-omni? If not, how could Qwen3-Omni/Qwen2.5-Omni thinker work in vllm? Currently, vllm-omni should only reuse the vllm attention for these two models AR parts mainly.
There was a problem hiding this comment.
Or just for some environment issues?
There was a problem hiding this comment.
On upstream when they update the default behavior, these two models are not evaluated explicitly. And most unit test restrictions model_len. The attention backend will encounter issues when context length of the model is too large.
There was a problem hiding this comment.
In general ROCM_ATTN is not as generalized as TRITON_ATTN. The current attention backend fallback condition is not broadly tested, so I would prefer if we can still keep the use of TRITON_ATTN for vllm omni models first else the model crashes and can't even launch.
gcanlin
left a comment
There was a problem hiding this comment.
LGTM. Sorry for the late review
| # However, the compatibility of ROCM_ATTN with Omni is not guaranteed. | ||
| # Therefore, we still use TRITON_ATTN as the default attention backend, | ||
| # when the selected_backend is not specified. | ||
| engine_args["attention_backend"] = "TRITON_ATTN" |
There was a problem hiding this comment.
Should we add a log to remind users here?
Thank you. And dont need to apologize. There are many PRs. Thank you for your hardworking in reviewing all these PR. Appreciate that. :) |
…with cuda tests (vllm-project#2340) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR addressed two issues:
ROCM_ATTNfor ROCm. However, the compatibility ofROCM_ATTNwith Omni is not guaranteed. Therefore, we still useTRITON_ATTNas the default attention backend, when the selected_backend is not specified.Disable "Bagel Tests" for further investigation.
Test Plan
Evaluated new test groups locally for newly added test groups: Voxtral-TTS, Qwen3-TTS, MIMO, CosyVoice3-TTS E2E Test
Test Result
Local test results
CI
test-amd-ready.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5042/steps/canvas (all green)
test-amd-merge.yaml: https://buildkite.com/vllm/vllm-omni-amd-ci/builds/5040/steps/canvas (all green)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)