Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn by albertvillanova · Pull Request #5122 · huggingface/trl

albertvillanova · 2026-02-18T10:48:01Z

This PR implements step 1 of #5119 by:

moving rollout_func dispatch to a top-level branch in GRPOTrainer._generate_single_turn, independent of vllm_generation.

The goal is to decouple rollout/agent logic from inference backend internals.

Fix #5121.
Supersede and close #4712.

Part of:

Decouple inference backend from rollout & agent logic #5119

What changed

GRPO trainer now owns rollout dispatch
- Added a top-level rollout_func branch in _generate_single_turn:
  - Calls self.rollout_func(prompts, self) directly
  - Validates required keys: prompt_ids, completion_ids, logprobs
  - Splits additional fields into extra_fields
- Kept vLLM weight sync in rollout path when use_vllm=True and step changed
Removed rollout handling from VLLMGeneration
- Removed rollout_func parameter/attribute from VLLMGeneration
- Removed rollout-specific branching from server/colocate generation paths
- Simplified server prompt-id duplication logic (always duplicate for n=num_generations alignment)
Added tests to new ownership boundaries
- Added GRPO rollout-dispatch tests:
  - rollout takes precedence
  - vLLM sync still runs when neede
  - missing required rollout keys raises

Motivation

Previously, rollout_func lived inside VLLMGeneration, coupling rollout/agent logic to one inference backend. This change makes rollout dispatch backend-agnostic at the trainer layer and narrows VLLMGeneration to inference concerns.

    
           if as_chat: 
        
               # For chat mode, we need to pass messages format 
        
               # Since prompts are already formatted strings, we use generate instead 
        
               output = trainer.vllm_generation.vllm_client.generate(prompts=prompts, **generation_kwargs) 
        
           else: 
        
               output = trainer.vllm_generation.vllm_client.generate(prompts=prompts, **generation_kwargs)

the line 132 should be updated

albertvillanova added 3 commits February 18, 2026 11:35

Move rollout_func to top-level in _generate_single_turn and keep vLLM…

8a0fd35

… weight sync

Remove rollout wiring from vLLM backend construction

7e66f62

Add GRPO rollout-dispatch tests

3f11a74

Remove rollout_func support from VLLMGeneration

ed3bbf2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn#5122

Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn#5122
albertvillanova wants to merge 4 commits intohuggingface:mainfrom
albertvillanova:fix-5119-5121

albertvillanova commented Feb 18, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 18, 2026

Uh oh!

qgallouedec commented Feb 18, 2026

Uh oh!

qgallouedec commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

albertvillanova commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Motivation

Related

Uh oh!

HuggingFaceDocBuilderDev commented Feb 18, 2026

Uh oh!

qgallouedec commented Feb 18, 2026

Uh oh!

qgallouedec commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

albertvillanova commented Feb 18, 2026 •

edited

Loading

qgallouedec commented Feb 18, 2026 •

edited

Loading