Skip to content

[ROCm][CI] Fix TP size issue for test_gpt_oss#35887

Merged
gshtras merged 1 commit intovllm-project:mainfrom
ROCm:micah/gpt-oss-tp-size
Mar 3, 2026
Merged

[ROCm][CI] Fix TP size issue for test_gpt_oss#35887
gshtras merged 1 commit intovllm-project:mainfrom
ROCm:micah/gpt-oss-tp-size

Conversation

@micah-wil
Copy link
Copy Markdown
Contributor

@micah-wil micah-wil commented Mar 3, 2026

Quantized Models Test is allocated to a 1 GPU agent pool in CI, but tries to run multi-GPU tests (example: https://buildkite.com/vllm/amd-ci/builds/5699/steps/canvas?sid=019cb28b-7107-44a7-adde-1af22fb4f7b7&tab=output#019cb28b-71fb-4bda-bc58-43ef57384abc/L1654)

This PR skips the multi-GPU test cases if there are not enough GPUs available.

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@mergify mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Mar 3, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Mar 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the issue of multi-GPU tests failing in single-GPU CI environments by conditionally skipping these tests when insufficient GPUs are available. This is a practical fix that improves CI stability and efficiency. The implementation correctly uses cuda_device_count_stateless to determine available resources.

import pytest
from packaging import version

from vllm.utils.torch_utils import cuda_device_count_stateless
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to PEP 8, imports should generally be grouped in the following order: standard library imports, third-party imports, and then local application/library specific imports. The vllm.utils.torch_utils import is a local application import and should be placed after packaging.version to maintain consistency with common Python style guidelines.

import pytest
from packaging import version

from vllm.utils.torch_utils import cuda_device_count_stateless
References
  1. Imports should be grouped in the following order: standard library, third-party, and local application/library specific imports. Each group should be separated by a blank line. (link)

@github-project-automation github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Mar 3, 2026
@gshtras gshtras enabled auto-merge (squash) March 3, 2026 20:10
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 3, 2026
@gshtras gshtras merged commit e721300 into vllm-project:main Mar 3, 2026
16 of 17 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Mar 3, 2026
@micah-wil micah-wil deleted the micah/gpt-oss-tp-size branch March 4, 2026 16:23
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Mar 12, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants