Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests by albertvillanova · Pull Request #5767 · huggingface/trl

albertvillanova · 2026-05-13T19:40:17Z

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests.

This PR updates the test configurations for GRPO and RLOO trainers to further reduce memory usage during VLM (Vision-Language Model) training. The primary change is lowering the per_device_train_batch_size from 3 to 1 in various test cases, with updated comments to clarify that this is to avoid out-of-memory (OOM) errors due to the memory-intensive nature of VLM training.

Partial fix for:

CI again often fails with torch.OutOfMemoryError: CUDA out of memory #5750

Aligned with:

Fix OOM in CI by reducing batch size in VLM SFT tests #5687

Related to:

Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma4 #5760

Motivation

CI was OOMing because -n auto runs 4 xdist workers sharing one 14.74 GiB GPU. When test_train_vlm[tiny-Gemma4ForConditionalGeneration] ran in one worker (~9.17 GiB), a concurrent worker attempting to allocate 2.95 GiB found only 2.87 GiB free. Two fixes address this at different levels:

shrink the Gemma4 model itself: Fix OOM in CI by reducing intermediate_size and image token budget for tiny Gemma4 #5760
and reduce the batch size in all VLM trainer tests: this PR
- Dropping to per_device_train_batch_size=1 reduces training-phase memory roughly 3×

Changes

Test configuration updates for memory optimization:

Reduced per_device_train_batch_size from 3 to 1 in all relevant test cases within tests/test_grpo_trainer.py to prevent OOM errors during VLM training.
Made the same batch size reduction in all relevant test cases within tests/test_rloo_trainer.py for consistency and to address VLM memory constraints.

These changes ensure that the test suite can run reliably on machines with limited memory resources when training VLMs.

Note

Low Risk
Test-only config tweaks to lower GPU memory usage in CI; low functional risk beyond potentially reducing VLM training coverage in tests.

Overview
Reduces GPU memory pressure in GRPO and RLOO vision-language training tests by lowering per_device_train_batch_size and num_generations from 3 to 2, with updated comments clarifying this is CI-only to avoid OOM.

Updates the GRPO multimodal tools test to reflect the new num_generations=2 behavior by adjusting the mocked generate batch shapes and the expected tools/call_frequency assertion (from 2/3 to 1/2).

^{Reviewed by Cursor Bugbot for commit 6184460. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f25ebbc. Configure here.}

HuggingFaceDocBuilderDev · 2026-05-13T19:43:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova · 2026-05-14T05:08:06Z

The CI errors are unrelated. See:

CI fails: AssertionError: Param model.visual.blocks.0.norm1.weight is not updated #5768

…e_train_batch_size

qgallouedec · 2026-05-15T16:59:06Z

is this pr still needed?
Technically it would works, but I feel like num_generations=2 is ideally not something we want because with a group of 2 rewards, standardized advantages are: (r1-μ)/σ = sign(r1-r2) = ±0.707, and the other is ∓0.707

so no matter how big or small the actual reward gap is, the advantages are always exactly +0.707 and -0.707. The magnitude of the reward difference gets completely erased. You're left with just "which completion was better," i.e. a pairwise sign, not a group-relative advantage.

…e_train_batch_size

albertvillanova · 2026-05-17T06:55:55Z

@qgallouedec,

Yes, this PR is still needed. The other OOM mitigations (reducing the tiny Gemma4 model footprint and clearing chained exception tracebacks) reduce accumulated memory between test reruns; they don't reduce the peak memory during a single VLM test run. Reducing per_device_train_batch_size and num_generations directly lowers peak GPU memory in both the generation phase and the training phase. These are independent, complementary fixes.

On num_generations=2, you're right that with only 2 completions per prompt, standardized advantages collapse to ±1/√2 ≈ ±0.707, losing reward magnitude information. This is a valid concern for production training, where num_generations should be larger (typically ≥ 4–8) to get meaningful advantage estimates. In this context, however, num_generations=2 is purely a CI test parameter: the goal is to verify that the code runs end-to-end without OOM errors, not to validate training quality or convergence.

It is also worth noting that num_generations=2 is already used across other VLM tests inour CI, e.g.:

trl/tests/test_grpo_trainer.py

Line 3069 in ed6055e

num_generations=2,

I'm adding a note to make this distinction explicit, clarifying that num_generations=2 is a CI-only concession and production training should use more generations.

qgallouedec · 2026-05-21T03:29:19Z

ok sounds good

albertvillanova added 2 commits May 13, 2026 21:28

Reduce per_device_train_batch_size in GRPO VLM CI tests

68e1c64

Reduce per_device_train_batch_size in RLOO VLM CI tests

f25ebbc

albertvillanova mentioned this pull request May 13, 2026

CI again often fails with torch.OutOfMemoryError: CUDA out of memory #5750

Closed

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread tests/test_grpo_trainer.py Outdated

Set compatible per_device_train_batch_size and num_generations

169d8d1

albertvillanova added 3 commits May 15, 2026 06:26

Merge branch 'main' into pfix-5750-per_device_train_batch_size

718fec6

Update test_train_with_tools_multimodal_response with num_generations=2

c9b6bd9

Merge remote-tracking branch 'upstream/main' into pfix-5750-per_devic…

7bb07fe

…e_train_batch_size

Merge remote-tracking branch 'upstream/main' into pfix-5750-per_devic…

f27792c

…e_train_batch_size

Add comment about num_generations=2 only for CI

6184460

qgallouedec approved these changes May 21, 2026

View reviewed changes

albertvillanova merged commit bbb3976 into main May 21, 2026
13 checks passed

albertvillanova deleted the pfix-5750-per_device_train_batch_size branch May 21, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests#5767

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests#5767
albertvillanova merged 8 commits into
mainfrom
pfix-5750-per_device_train_batch_size

albertvillanova commented May 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2026

Uh oh!

albertvillanova commented May 14, 2026

Uh oh!

qgallouedec commented May 15, 2026

Uh oh!

albertvillanova commented May 17, 2026 •

edited

Loading

Uh oh!

qgallouedec commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented May 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2026

Uh oh!

albertvillanova commented May 14, 2026

Uh oh!

qgallouedec commented May 15, 2026

Uh oh!

albertvillanova commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented May 13, 2026 •

edited by cursor Bot

Loading

albertvillanova commented May 17, 2026 •

edited

Loading