Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities by albertvillanova · Pull Request #4712 · huggingface/trl

albertvillanova · 2025-12-17T19:07:58Z

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities.

This PR refactors the _generate_single_turn method in GRPOTrainer to separate rollout_func and vllm functionalities into independent, top-level concerns. This makes the code more modular and easier to extract the vLLM functionality into a separate module. See:

Refactor vLLM generation [1/N]: Extract vLLM generation #4700

Motivation

Previously, the rollout_func logic was nested within the use_vllm conditional blocks, creating unnecessary coupling between two orthogonal features:

rollout_func: A general hook for custom generation logic (can work with or without vLLM)
vllm: An inference acceleration backend (can work with or without rollout_func)

This coupling makes the code harder to understand and makes it difficult to extract either feature into separate modules.

Changes

Restructured _generate_single_turn into two clear phases:

Phase 1: vLLM Preparation

Runs when use_vllm=True, regardless of whether rollout_func is provided
Handles weight updates, sleep mode management, multimodal message preparation, and tool call formatting
Ensures vLLM is properly configured before any generation happens

Phase 2: Generation Dispatch

Independent top-level conditionals for each generation path:

if rollout_func is not None: → Custom rollout (may use vLLM internally)
elif use_vllm: → Trainer-managed vLLM generation
elif use_transformers_paged: → Paged transformers generation
else: → Regular transformers generation

Benefits

Clearer separation of concerns: vLLM preparation vs. generation dispatch
Correct behavior preserved: vLLM weights are updated even when using rollout_func
Better modularity: Each feature can be understood and modified independently
Future-proof: Easier to extract either feature into separate modules
No functional changes: Pure refactoring, all existing behavior preserved

HuggingFaceDocBuilderDev · 2025-12-17T19:10:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rycerzes · 2026-02-02T19:50:20Z

Hi @albertvillanova! I'm working on GRPO training with custom rollout functions using UnslothGRPOTrainer and OpenEnv and currently blocked by the vllm coupling issue that your PR addresses. I'd love to help move this forward if useful!

albertvillanova added 2 commits December 17, 2025 19:58

Run CI tests for draft PR

35cb6e4

Decouple rollout_func and vLLM functionalities

0299f6b

albertvillanova mentioned this pull request Dec 17, 2025

Refactor vLLM generation [1/N]: Extract vLLM generation #4700

Merged

albertvillanova marked this pull request as draft December 19, 2025 10:46

This was referenced Feb 18, 2026

Decouple inference backend from rollout & agent logic #5119

Open

Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn #5122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities#4712

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities#4712
albertvillanova wants to merge 2 commits intohuggingface:mainfrom
albertvillanova:factorize-rollout-func

albertvillanova commented Dec 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 17, 2025

Uh oh!

rycerzes commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

albertvillanova commented Dec 17, 2025

Motivation

Changes

Phase 1: vLLM Preparation

Phase 2: Generation Dispatch

Benefits

Uh oh!

HuggingFaceDocBuilderDev commented Dec 17, 2025

Uh oh!

rycerzes commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments