[ROCm][CI] Fix TP size issue for test_gpt_oss#35887
[ROCm][CI] Fix TP size issue for test_gpt_oss#35887gshtras merged 1 commit intovllm-project:mainfrom
test_gpt_oss#35887Conversation
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the issue of multi-GPU tests failing in single-GPU CI environments by conditionally skipping these tests when insufficient GPUs are available. This is a practical fix that improves CI stability and efficiency. The implementation correctly uses cuda_device_count_stateless to determine available resources.
| import pytest | ||
| from packaging import version | ||
|
|
||
| from vllm.utils.torch_utils import cuda_device_count_stateless |
There was a problem hiding this comment.
According to PEP 8, imports should generally be grouped in the following order: standard library imports, third-party imports, and then local application/library specific imports. The vllm.utils.torch_utils import is a local application import and should be placed after packaging.version to maintain consistency with common Python style guidelines.
import pytest
from packaging import version
from vllm.utils.torch_utils import cuda_device_count_statelessReferences
- Imports should be grouped in the following order: standard library, third-party, and local application/library specific imports. Each group should be separated by a blank line. (link)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Quantized Models Testis allocated to a 1 GPU agent pool in CI, but tries to run multi-GPU tests (example: https://buildkite.com/vllm/amd-ci/builds/5699/steps/canvas?sid=019cb28b-7107-44a7-adde-1af22fb4f7b7&tab=output#019cb28b-71fb-4bda-bc58-43ef57384abc/L1654)This PR skips the multi-GPU test cases if there are not enough GPUs available.