[ROCm][CI] Remove deepep DBO tests on gfx90a#37614
[ROCm][CI] Remove deepep DBO tests on gfx90a#37614DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request removes the DeepEP DBO tests from running on the gfx90a architecture, which is correct as it's not compatible. It achieves this by moving a CI job from mi250 to mi325 hardware and removing the DBO test from it. While this works, it introduces inconsistencies in the CI configuration. The job's label and mirror_hardwares list are no longer accurate, which can be misleading. I've added a comment to suggest updating them for clarity and maintainability.
| timeout_in_minutes: 180 | ||
| mirror_hardwares: [amdexperimental, amdproduction, amdgfx90anightly, amdmi250] | ||
| agent_pool: mi250_2 | ||
| agent_pool: mi325_2 |
There was a problem hiding this comment.
With the agent_pool changed to mi325_2, the job label on line 1402 ("Distributed Tests (2 GPUs)(H100-MI250) # TBD") is now misleading. Please update it to reflect the new hardware (e.g., MI325).
Additionally, the mirror_hardwares on line 1404 might need to be updated. Another job on mi325_2 (starting on line 2594) uses [amdexperimental, amdproduction, amdgfx942nightly, amdmi325]. Consider aligning this for consistency.
There was a problem hiding this comment.
Not applicable. The attempt here is to deduplicate tests, and only stick on hardware-specific tests for each platform to save CI infra time.
|
Testing MI250 to see if issue is resolved (added |
|
Test group confirmed passing: https://buildkite.com/vllm/amd-ci/builds/6710/steps/canvas?sid=019d087f-fe73-4e87-963c-cbb9d21cb8e3&tab=output |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Follow-up for:
Removes dpo test from gfx90a, since DeepEP is not compatible with gfx90a arch. Addresses failure in
mi250_2: Distributed Tests (2 GPUs)(H100-MI250)Motivation: https://buildkite.com/vllm/amd-ci/builds/6701/steps/canvas?sid=019d07a7-1a2e-4d29-91e7-9eb765bc4904&tab=output
cc @kenroche