Skip to content

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities#4712

Draft
albertvillanova wants to merge 2 commits intohuggingface:mainfrom
albertvillanova:factorize-rollout-func
Draft

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities#4712
albertvillanova wants to merge 2 commits intohuggingface:mainfrom
albertvillanova:factorize-rollout-func

Conversation

@albertvillanova
Copy link
Member

Refactor vLLM generation [2/N]: Decouple rollout_func and vLLM functionalities.

This PR refactors the _generate_single_turn method in GRPOTrainer to separate rollout_func and vllm functionalities into independent, top-level concerns. This makes the code more modular and easier to extract the vLLM functionality into a separate module. See:

Motivation

Previously, the rollout_func logic was nested within the use_vllm conditional blocks, creating unnecessary coupling between two orthogonal features:

  • rollout_func: A general hook for custom generation logic (can work with or without vLLM)
  • vllm: An inference acceleration backend (can work with or without rollout_func)

This coupling makes the code harder to understand and makes it difficult to extract either feature into separate modules.

Changes

Restructured _generate_single_turn into two clear phases:

Phase 1: vLLM Preparation

  • Runs when use_vllm=True, regardless of whether rollout_func is provided
  • Handles weight updates, sleep mode management, multimodal message preparation, and tool call formatting
  • Ensures vLLM is properly configured before any generation happens

Phase 2: Generation Dispatch

Independent top-level conditionals for each generation path:

  1. if rollout_func is not None: → Custom rollout (may use vLLM internally)
  2. elif use_vllm: → Trainer-managed vLLM generation
  3. elif use_transformers_paged: → Paged transformers generation
  4. else: → Regular transformers generation

Benefits

  • Clearer separation of concerns: vLLM preparation vs. generation dispatch
  • Correct behavior preserved: vLLM weights are updated even when using rollout_func
  • Better modularity: Each feature can be understood and modified independently
  • Future-proof: Easier to extract either feature into separate modules
  • No functional changes: Pure refactoring, all existing behavior preserved

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@albertvillanova albertvillanova marked this pull request as draft December 19, 2025 10:46
@rycerzes
Copy link

rycerzes commented Feb 2, 2026

Hi @albertvillanova! I'm working on GRPO training with custom rollout functions using UnslothGRPOTrainer and OpenEnv and currently blocked by the vllm coupling issue that your PR addresses. I'd love to help move this forward if useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments