Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide by qgallouedec · Pull Request #5255 · huggingface/trl

qgallouedec · 2026-03-10T04:57:38Z

Summary

Default vllm_mode changed from "server" to "colocate" across all configs (GRPOConfig, RLOOConfig, GOLDConfig, OnlineDPOConfig). Colocate is a better default — it works out of the box without spinning up a separate vLLM server, which is the most common friction point for new users. Users who need server mode can still set vllm_mode="server" explicitly.
Updated docs and tests that implicitly assumed the old "server" default to explicitly set vllm_mode="server" where needed (vllm_integration.md, grpo_trainer.md, rloo_trainer.md, speeding_up_training.md, test_online_dpo_trainer.py).
Added a minimal MIGRATION.md at the repo root for v0 → v1. It should stay small — we have already done most of the breaking work in v0.29 (trainers moved to experimental, model classes removed, etc.), so the guide just highlights what is genuinely new and keeps the v0.29 changes in collapsible sections for anyone migrating from an older version.

Note

Medium Risk
Changes the default vLLM execution mode from external server to in-process across multiple configs, which can alter runtime behavior and GPU memory requirements for existing training setups.

Overview
Switches the default vLLM integration mode to "colocate" (from "server") across GRPOConfig, RLOOConfig, OnlineDPOConfig, GOLDConfig, and the VLLMGeneration backend.

Docs and examples are updated to reflect the new default and to explicitly set vllm_mode="server" where server-based workflows are described (grpo_trainer.md, rloo_trainer.md, speeding_up_training.md, vllm_integration.md). Tests are adjusted to pin server-mode behavior and to assert the new default (test_online_dpo_trainer.py).

Adds a new top-level MIGRATION.md highlighting v0→v1 breaking changes, including the vllm_mode default change and an SFTConfig.packing rename.

^{Written by Cursor Bugbot for commit d8a0872. This will update automatically on new commits. Configure here.}

Colocate mode is a better default: it works out of the box without requiring a separate vLLM server process, which is the most common friction point for new users. Users who need server mode can still set `vllm_mode="server"`. Also adds a minimal migration guide for v0 to v1. Most breaking changes (trainers moved to experimental, removed model classes, etc.) already shipped in v0.29, so the guide focuses on what's new and keeps v0.29 changes in a collapsible section for reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

HuggingFaceDocBuilderDev · 2026-03-10T05:00:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 424fc21035

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Add explicit vllm_mode="server" in docs/tests that rely on server mode - Remove stale "default value, can be omitted" comments - Move migration guide from docs to MIGRATION.md at repo root Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Colocate is now Option 1 (default), server is Option 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…mments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

albertvillanova

Thanks. A couple of questions below.

albertvillanova

Thanks for your explanation. Just a comment below.

commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

…ide (huggingface#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Mar 10, 2026

View reviewed changes

Comment thread MIGRATION.md

qgallouedec and others added 3 commits March 10, 2026 05:03

Swap colocate/server order in docs to reflect new default

7c7d1b3

Colocate is now Option 1 (default), server is Option 2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Swap colocate/server order in vllm_integration.md and remove stale co…

918635b

…mments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qgallouedec requested review from AmineDiro and albertvillanova March 10, 2026 05:18

albertvillanova reviewed Mar 10, 2026

View reviewed changes

Comment thread trl/generation/vllm_generation.py

Comment thread MIGRATION.md

qgallouedec and others added 2 commits March 10, 2026 20:46

Remove already shipped changes in MIGRATION.md

7e7047d

Merge branch 'main' into default-vllm-colocate

40beb9d

albertvillanova approved these changes Mar 11, 2026

View reviewed changes

Comment thread MIGRATION.md

AmineDiro approved these changes Mar 11, 2026

View reviewed changes

qgallouedec and others added 2 commits March 13, 2026 17:35

Merge branch 'main' into default-vllm-colocate

2c1f6d4

Update migration guide to include renamed options for SFTConfig

d8a0872

qgallouedec merged commit c0eabc4 into main Mar 14, 2026
15 of 16 checks passed

qgallouedec deleted the default-vllm-colocate branch March 14, 2026 00:19

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Change default vllm_mode to "colocate" and add v0→v1 migration gu…

0a1529d

…ide (huggingface#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide#5255

Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide#5255
qgallouedec merged 8 commits into
mainfrom
default-vllm-colocate

qgallouedec commented Mar 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

qgallouedec commented Mar 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

HuggingFaceDocBuilderDev commented Mar 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qgallouedec commented Mar 10, 2026 •

edited by cursor Bot

Loading