Skip to content

[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI#29808

Merged
gshtras merged 3 commits intovllm-project:mainfrom
ROCm:micah/cudagraph_test_20251201
Dec 2, 2025
Merged

[ROCm][CI] Fix test_cudagraph_mode.py Failure For AMD CI#29808
gshtras merged 3 commits intovllm-project:mainfrom
ROCm:micah/cudagraph_test_20251201

Conversation

@micah-wil
Copy link
Copy Markdown
Contributor

@micah-wil micah-wil commented Dec 1, 2025

After #26847 was merged, we are seeing failures in AMD CI in the v1/cudagraph/test_cudagraph_mode.py test with errors like DID NOT RAISE <class 'Exception'>.

=========================== short test summary info ============================
FAILED v1/cudagraph/test_cudagraph_mode.py::test_cudagraph_compilation_combo[RocmAttn-PIECEWISE-0-False]
FAILED v1/cudagraph/test_cudagraph_mode.py::test_cudagraph_compilation_combo[RocmAttn-FULL_AND_PIECEWISE-0-False]
============= 2 failed, 12 passed, 3 warnings in 289.02s (0:04:49) =============

The problem was just that the supported field was updated for the CUDA backend test but not the ROCm backend in the combo_cases_2 test case. Further, this happened because of a change I made last week in #29367 which introduced some code duplication that I have fixed here so that we can avoid these types of errors in the future.

With this PR, I am again seeing the following when running pytest -v -s v1/cudagraph/test_cudagraph_mode.py:

================================================================ warnings summary ================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================== 14 passed, 2 warnings in 211.22s (0:03:31) ===================================================

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added nvidia rocm Related to AMD ROCm v1 labels Dec 1, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a CI failure on ROCm for test_cudagraph_mode.py by correcting the supported flag in test cases. The main improvement is the refactoring that removes duplicated code for ROCm and CUDA test configurations, replacing it with a unified setup. This not only fixes the immediate issue but also enhances maintainability and reduces the likelihood of similar bugs in the future. The change is clean, correct, and a good improvement to the test suite.

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 2, 2025
@gshtras gshtras enabled auto-merge (squash) December 2, 2025 17:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 2, 2025
@gshtras gshtras merged commit c014de1 into vllm-project:main Dec 2, 2025
19 of 20 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Dec 2, 2025
@micah-wil micah-wil deleted the micah/cudagraph_test_20251201 branch December 2, 2025 23:00
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…t#29808)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants