Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests#5767
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f25ebbc. Configure here.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
The CI errors are unrelated. See: |
|
is this pr still needed? so no matter how big or small the actual reward gap is, the advantages are always exactly +0.707 and -0.707. The magnitude of the reward difference gets completely erased. You're left with just "which completion was better," i.e. a pairwise sign, not a group-relative advantage. |
…e_train_batch_size
|
Yes, this PR is still needed. The other OOM mitigations (reducing the tiny Gemma4 model footprint and clearing chained exception tracebacks) reduce accumulated memory between test reruns; they don't reduce the peak memory during a single VLM test run. Reducing On It is also worth noting that trl/tests/test_grpo_trainer.py Line 3069 in ed6055e I'm adding a note to make this distinction explicit, clarifying that |
|
ok sounds good |

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests.
This PR updates the test configurations for GRPO and RLOO trainers to further reduce memory usage during VLM (Vision-Language Model) training. The primary change is lowering the
per_device_train_batch_sizefrom 3 to 1 in various test cases, with updated comments to clarify that this is to avoid out-of-memory (OOM) errors due to the memory-intensive nature of VLM training.Partial fix for:
Aligned with:
Related to:
Motivation
CI was OOMing because
-n autoruns 4 xdist workers sharing one 14.74 GiB GPU. Whentest_train_vlm[tiny-Gemma4ForConditionalGeneration]ran in one worker (~9.17 GiB), a concurrent worker attempting to allocate 2.95 GiB found only 2.87 GiB free. Two fixes address this at different levels:Changes
Test configuration updates for memory optimization:
per_device_train_batch_sizefrom 3 to 1 in all relevant test cases withintests/test_grpo_trainer.pyto prevent OOM errors during VLM training.tests/test_rloo_trainer.pyfor consistency and to address VLM memory constraints.These changes ensure that the test suite can run reliably on machines with limited memory resources when training VLMs.
Note
Low Risk
Test-only config tweaks to lower GPU memory usage in CI; low functional risk beyond potentially reducing VLM training coverage in tests.
Overview
Reduces GPU memory pressure in GRPO and RLOO vision-language training tests by lowering
per_device_train_batch_sizeandnum_generationsfrom3to2, with updated comments clarifying this is CI-only to avoid OOM.Updates the GRPO multimodal tools test to reflect the new
num_generations=2behavior by adjusting the mockedgeneratebatch shapes and the expectedtools/call_frequencyassertion (from2/3to1/2).Reviewed by Cursor Bugbot for commit 6184460. Bugbot is set up for automated code reviews on this repo. Configure here.