[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 by AndreasKaratzas · Pull Request #37610 · vllm-project/vllm

AndreasKaratzas · 2026-03-19T22:50:38Z

Follow-up for:

[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline #34839

Fixes OOM in mi250_1: Multi-Modal Models (Standard) 2: qwen3 + gemma

Motivation: https://buildkite.com/vllm/amd-ci/builds/6701/steps/canvas?sid=019d07a7-1a19-4174-b4a1-c9bbfff0c164&tab=output

@kenroche

AndreasKaratzas · 2026-03-19T22:51:30Z

Testing MI250 to see if issue is resolved (added rocm and ready labels).

gemini-code-assist

Code Review

This pull request addresses an Out-Of-Memory error on MI250 for gemma3 tests under ROCm by skipping the Scaled Dot-Product Attention (SDP) override. The change is simple and effective. I've added one suggestion to improve code maintainability by documenting the reason for this skip directly in the code.

tests/models/multimodal/conftest.py

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…_mod_gemma

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…_mod_gemma

AndreasKaratzas · 2026-03-20T21:44:27Z

I have just added a large GPU mark for ROCm only here. This will help skip the test if the platform is mi250 and resolve OOMing there.

AndreasKaratzas · 2026-03-20T23:23:09Z

Test group has been confirmed green: https://buildkite.com/vllm/amd-ci/builds/6743/steps/canvas?sid=019d0d33-9d6c-4d3e-a0b7-bd741edf4239&tab=output

DarkLight1337 · 2026-03-21T03:35:12Z

Actually, how can a 4B model cause OOM?

AndreasKaratzas · 2026-03-21T04:06:57Z

I think it's the profiling stage that generates a tensor that is big enough to create that. It happens during the SDPA stage.

DarkLight1337 · 2026-03-21T04:57:14Z

Hmm ok, maybe you should investigate this further as it's quite unexpected. Let's get the CI to pass first though

…project#37610) Signed-off-by: sagformas <sagformas@epdcenter.es>

AndreasKaratzas added the rocm Related to AMD ROCm label Mar 19, 2026

github-project-automation bot added this to AMD Mar 19, 2026

github-project-automation bot moved this to Todo in AMD Mar 19, 2026

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 19, 2026

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 19, 2026

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

tests/models/multimodal/conftest.py Outdated Show resolved Hide resolved

AndreasKaratzas force-pushed the akaratza_gfx90a_multi_mod_gemma branch from 3e44e63 to 879b58b Compare March 20, 2026 00:46

AndreasKaratzas changed the title ~~[ROCm][CI] Skip SDP override for gemma3 to avoid OOM on MI250 GCDs~~ [ROCm][CI] Reduce image resolution for gemma3 to avoid OOM on MI250 Mar 20, 2026

[ROCm][CI] Fix gemma3-transformers OOM on MI250 during profiling

cdcea97

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas force-pushed the akaratza_gfx90a_multi_mod_gemma branch from 700d164 to cdcea97 Compare March 20, 2026 06:06

AndreasKaratzas added 2 commits March 20, 2026 10:25

Merge remote-tracking branch 'origin/main' into akaratza_gfx90a_multi…

2f9496f

…_mod_gemma

[ROCm][CI] Fix gemma3-transformers OOM on MI250 during profiling

be5bced

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas changed the title ~~[ROCm][CI] Reduce image resolution for gemma3 to avoid OOM on MI250~~ [ROCm][CI] Split test step for gemma3 to avoid OOM on MI250 Mar 20, 2026

mergify bot added the ci/build label Mar 20, 2026

AndreasKaratzas added 2 commits March 20, 2026 16:42

[ROCm][CI] Fix gemma3-transformers OOM on MI250

af2e634

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_gfx90a_multi…

0094c21

…_mod_gemma

AndreasKaratzas marked this pull request as ready for review March 20, 2026 21:45

AndreasKaratzas requested review from DarkLight1337 and ywang96 as code owners March 20, 2026 21:45

AndreasKaratzas changed the title ~~[ROCm][CI] Split test step for gemma3 to avoid OOM on MI250~~ [ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 Mar 20, 2026

DarkLight1337 approved these changes Mar 21, 2026

View reviewed changes

DarkLight1337 merged commit 0d50fa1 into vllm-project:main Mar 21, 2026
22 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 21, 2026

AndreasKaratzas deleted the akaratza_gfx90a_multi_mod_gemma branch March 21, 2026 05:24

AndreasKaratzas mentioned this pull request Mar 21, 2026

[CI Failure]: Gemma3 OOMs with transformers backend #37736

Open

JartX pushed a commit to JartX/vllm that referenced this pull request Mar 21, 2026

[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (vllm-…

65d0d1f

…project#37610) Signed-off-by: sagformas <sagformas@epdcenter.es>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250#37610

[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250#37610
DarkLight1337 merged 5 commits intovllm-project:mainfrom
ROCm:akaratza_gfx90a_multi_mod_gemma

AndreasKaratzas commented Mar 19, 2026 •

edited

Loading

Uh oh!

AndreasKaratzas commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 21, 2026

Uh oh!

AndreasKaratzas commented Mar 21, 2026

Uh oh!

DarkLight1337 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AndreasKaratzas commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 21, 2026

Uh oh!

AndreasKaratzas commented Mar 21, 2026

Uh oh!

DarkLight1337 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasKaratzas commented Mar 19, 2026 •

edited

Loading

DarkLight1337 commented Mar 21, 2026 •

edited

Loading