Skip to content

Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn#5122

Open
albertvillanova wants to merge 4 commits intohuggingface:mainfrom
albertvillanova:fix-5119-5121
Open

Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn#5122
albertvillanova wants to merge 4 commits intohuggingface:mainfrom
albertvillanova:fix-5119-5121

Conversation

@albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Feb 18, 2026

This PR implements step 1 of #5119 by:

  • moving rollout_func dispatch to a top-level branch in GRPOTrainer._generate_single_turn, independent of vllm_generation.

The goal is to decouple rollout/agent logic from inference backend internals.

Fix #5121.
Supersede and close #4712.

Part of:

What changed

  • GRPO trainer now owns rollout dispatch
    • Added a top-level rollout_func branch in _generate_single_turn:
      • Calls self.rollout_func(prompts, self) directly
      • Validates required keys: prompt_ids, completion_ids, logprobs
      • Splits additional fields into extra_fields
    • Kept vLLM weight sync in rollout path when use_vllm=True and step changed
  • Removed rollout handling from VLLMGeneration
    • Removed rollout_func parameter/attribute from VLLMGeneration
    • Removed rollout-specific branching from server/colocate generation paths
    • Simplified server prompt-id duplication logic (always duplicate for n=num_generations alignment)
  • Added tests to new ownership boundaries
    • Added GRPO rollout-dispatch tests:
      • rollout takes precedence
      • vLLM sync still runs when neede
      • missing required rollout keys raises

Motivation

Previously, rollout_func lived inside VLLMGeneration, coupling rollout/agent logic to one inference backend. This change makes rollout dispatch backend-agnostic at the trainer layer and narrows VLLMGeneration to inference concerns.

Related

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member

Can you confirm that after this one, rollout_func no longer receives gathered prompts in server mode?

@qgallouedec
Copy link
Member

qgallouedec commented Feb 18, 2026

Another important change is that rollout_func will now receive the unprocessed prompts (before apply_chat_template, which, AFAICT, should be fine, expect in one case:

if as_chat:
# For chat mode, we need to pass messages format
# Since prompts are already formatted strings, we use generate instead
output = trainer.vllm_generation.vllm_client.generate(prompts=prompts, **generation_kwargs)
else:
output = trainer.vllm_generation.vllm_client.generate(prompts=prompts, **generation_kwargs)

the line 132 should be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple rollout from vLLM [1/N]: Move rollout_func dispatch to top-level _generate_single_turn

3 participants

Comments