Kd vllm generation by cmpatino · Pull Request #5351 · huggingface/trl

cmpatino · 2026-03-23T11:30:06Z

What does this PR do?

Addresses the comment from #5137 to use trl.generation.VLLMGeneration instead of the separate vLLM logic.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

Medium Risk
Moderate risk because it rewires GOLD on-policy generation, prompt/attention masking, and weight-sync behavior for vLLM, which can affect training correctness and stability in distributed setups.

Overview
GOLD’s vLLM path is refactored to use trl.generation.VLLMGeneration instead of trainer-local server/colocate logic, removing the custom callback/weight-update code and centralizing sync + generation behind sync_weights()/generate().

On-policy buffering is updated to pass prompt token IDs directly (respecting prompt_attention_mask), build concatenated attention_mask explicitly, and generate labels only on completion tokens that are not padded. GOLDConfig gains new vLLM knobs (vllm_server_base_url, vllm_group_port, vllm_max_model_length, vllm_model_impl), and tests are updated/added to lock in these behaviors (prompt masking for vLLM prompts, special-token reconstruction, and defaulting vllm_max_model_length to max_length).

^{Written by Cursor Bugbot for commit dcfce59. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-23T13:58:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-24T15:17:38Z

+            prompts=prompt_ids_list,
+            images=None,
+            num_generations=self.num_generations,
+        )


Incompatible prompt handling for multi-generation vLLM usage

Medium Severity

VLLMGeneration.generate assumes prompts are pre-duplicated num_generations times (as GRPO does via its RepeatSampler), but GOLD passes each prompt exactly once. In server mode, the method does all_prompts[::num_generations] to "deduplicate," which skips prompts when num_generations > 1. In colocate mode, n=1 is hardcoded, so extra generations are silently ignored. The old code had custom _generate_vllm_server_global / _generate_vllm_colocate methods that correctly handled n=num_generations without requiring pre-duplication. This is a regression when num_generations > 1 (default is 1, so typical usage is unaffected).

cmpatino added 2 commits March 18, 2026 15:32

Use VLLMGeneration in GOLDTrainer

ffde2d0

Update with precommit

1797fc1

kashif approved these changes Mar 23, 2026

View reviewed changes

cmpatino marked this pull request as ready for review March 23, 2026 12:15

cursor Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread trl/experimental/gold/gold_trainer.py

Comment thread trl/experimental/gold/gold_trainer.py

Fix how we handle padding and special tokens

b629987

cursor Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread trl/experimental/gold/gold_trainer.py

cmpatino added 2 commits March 23, 2026 15:19

Address concern about vllm weight sync

cdc3196

Run precommit

f4c193e

cursor Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread trl/experimental/gold/gold_trainer.py Outdated

cmpatino added 4 commits March 23, 2026 15:32

Fix max len behavior for generation

2b41f84

Format docstring

91715cb

Remove decode -> re-tokenization roundtrip

b94fc1f

Run precommit

dcfce59

cursor Bot reviewed Mar 24, 2026

View reviewed changes

cmpatino mentioned this pull request Mar 26, 2026

[GKD] Buffer Implementation for Distillation Trainer #5137

Merged

3 tasks

cmpatino merged commit 05eac2c into huggingface:main Mar 26, 2026
4 checks passed

cmpatino deleted the kd-vllm-generation branch March 26, 2026 09:37

cmpatino restored the kd-vllm-generation branch March 26, 2026 09:58

cmpatino deleted the kd-vllm-generation branch March 26, 2026 10:03

behroozazarkhalili mentioned this pull request Jun 14, 2026

[GOLD][vLLM] On-policy generation decodes prompts with skip_special_tokens=True (chat template stripped) — intentional? #5241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kd vllm generation#5351

Kd vllm generation#5351
cmpatino merged 9 commits into
huggingface:mainfrom
cmpatino:kd-vllm-generation

cmpatino commented Mar 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cmpatino commented Mar 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 24, 2026

Choose a reason for hiding this comment

Incompatible prompt handling for multi-generation vLLM usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cmpatino commented Mar 23, 2026 •

edited by cursor Bot

Loading