Fix OOM in CI by reducing batch size in VLM SFT tests by albertvillanova · Pull Request #5687 · huggingface/trl

albertvillanova · 2026-04-30T11:41:30Z

Fix OOM in CI by reducing batch size in VLM SFT tests.

Partial fix for:

CI often fails with torch.OutOfMemoryError: CUDA out of memory #5207

Motivation

VLM training tests in test_sft_trainer.py were running with the default per_device_train_batch_size=8. For Gemma3, with vocab_size=262208 (production-scale, never reduced for tiny models) and mm_tokens_per_image=256, each training step computes logits of shape [8, 279, 262208].

PyTorch needs several float32 copies of this tensor for log-softmax and its gradient, pushing peak GPU memory to ~9 GiB per worker. With 4 parallel pytest-xdist workers this caused CUDA out-of-memory errors for other concurrent tests.

Solution

Set per_device_train_batch_size=1 in test_train_vlm, test_train_vlm_multi_image, and test_train_vlm_prompt_completion, following the pattern already used in test_train_vlm_gemma_3n. This drops peak GPU memory to ~1.1 GiB per worker for Gemma3, leaving ample headroom for parallel execution.

Note

Low Risk
Test-only change that lowers batch size to avoid CI OOMs; no production code paths are modified.

Overview
Reduces GPU memory pressure in vision-language SFT integration tests by explicitly setting per_device_train_batch_size=1 in test_train_vlm, test_train_vlm_multi_image, and test_train_vlm_prompt_completion.

This prevents CI CUDA OOMs during parallel test execution while keeping the VLM-specific max_length=None behavior unchanged.

^{Reviewed by Cursor Bugbot for commit 0480d77. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-30T11:44:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-04-30T13:07:46Z

It seems to work!

Reduce per_device_train_batch_size in SFT VLM CI tests

0480d77

albertvillanova mentioned this pull request Apr 30, 2026

Fix OOM in CI by reducing image size of tiny Gemma3 model #5680

Merged

qgallouedec approved these changes Apr 30, 2026

View reviewed changes

albertvillanova merged commit 32bec88 into main Apr 30, 2026
13 of 14 checks passed

albertvillanova deleted the pfix-5207-per_device_train_batch_size branch April 30, 2026 13:59

albertvillanova mentioned this pull request Apr 30, 2026

CI often fails with torch.OutOfMemoryError: CUDA out of memory #5207

Closed

qgallouedec pushed a commit that referenced this pull request May 3, 2026

Fix OOM in CI by reducing batch size in VLM SFT tests (#5687)

abb98ac

albertvillanova mentioned this pull request May 13, 2026

Fix OOM in CI by reducing batch size in GRPO/RLOO VLM tests #5767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OOM in CI by reducing batch size in VLM SFT tests#5687

Fix OOM in CI by reducing batch size in VLM SFT tests#5687
albertvillanova merged 1 commit into
mainfrom
pfix-5207-per_device_train_batch_size

albertvillanova commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 30, 2026

Uh oh!

qgallouedec commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

HuggingFaceDocBuilderDev commented Apr 30, 2026

Uh oh!

qgallouedec commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Apr 30, 2026 •

edited by cursor Bot

Loading