Fix GRPOTrainer attribute access for vLLM model config by falcondai · Pull Request #5302 · huggingface/trl

falcondai · 2026-03-18T05:06:42Z

it should be self.vllm_generation.llm...

What does this PR do?

Tested by running one of the example scripts.

python examples/scripts/grpo_agent.py \
    --model_name_or_path Qwen/Qwen3-1.7B \
    --output_dir grpo_biogrid_qwen_3g-1.7b \
    --push_to_hub True \
    --use_vllm True \
    --vllm_mode colocate \
    --max_completion_length 1024 \
    --report_to trackio \
    --log_completions True \
    --max_steps 400

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

Low Risk
Low risk: a one-line attribute-chain fix used only to read max_model_len for overlong prompt filtering in vLLM colocate mode.

Overview
Fixes a broken attribute chain in GRPOTrainer._tool_call_loop when running with vLLM in colocate mode by reading max_model_len from self.vllm_generation.llm... instead of a non-existent self.llm.... This prevents failures when filtering overlong prompt+tool-call sequences before regeneration.

^{Written by Cursor Bugbot for commit 5935d78. This will update automatically on new commits. Configure here.}

it should be `self.vllm_generation.llm...`

qgallouedec · 2026-03-18T05:08:56Z

thanks, I think there is the same issue in RLOO, do you mind fixing it as well?

falcondai · 2026-03-18T05:46:06Z

rloo trainer seems free of this issue. I looked through the implementation and tested it with an example script python examples/scripts/rloo.py --use-vllm True --vllm-mode colocate.

One related issue I saw when searching "self.llm.llm_engine" through the repo is that online_dpo_trainer.py may benefit from a refactor to use VLLMGeneration. As it stands, it seems to handle colocate mode with similar (duplicate?) code as VLLMGeneration.

HuggingFaceDocBuilderDev · 2026-03-18T09:57:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

albertvillanova

Thanks for the fix in GRPO and for validating RLOO. Very helpful!

In relation with OnlineDPOTrainer, that's a good observation. Just for context, OnlineDPOTrainer currently lives in the experimental submodule, so it hasn't been fully aligned with the abstractions used in the core trainers yet.

That said, I agree that this could benefit from a refactor to reuse VLLMGeneration, even if it is not a high priority right now.

)

Fix grpo trainer attribute chain

5935d78

it should be `self.vllm_generation.llm...`

qgallouedec requested a review from albertvillanova March 18, 2026 05:08

albertvillanova approved these changes Mar 18, 2026

View reviewed changes

albertvillanova changed the title ~~Fix grpo trainer attribute chain~~ Fix GRPOTrainer attribute access for vLLM model config Mar 18, 2026

qgallouedec added the 🩹 for patch label Mar 19, 2026

qgallouedec merged commit 116fc42 into huggingface:main Mar 19, 2026
12 checks passed

falcondai deleted the patch-1 branch March 19, 2026 03:37

qgallouedec pushed a commit that referenced this pull request Mar 20, 2026

Fix GRPOTrainer attribute access for vLLM model config (#5302)

0007dc2

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Fix GRPOTrainer attribute access for vLLM model config (huggingface#5302

6b67657

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GRPOTrainer attribute access for vLLM model config#5302

Fix GRPOTrainer attribute access for vLLM model config#5302
qgallouedec merged 1 commit into
huggingface:mainfrom
falcondai:patch-1

falcondai commented Mar 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

qgallouedec commented Mar 18, 2026

Uh oh!

falcondai commented Mar 18, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2026

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

falcondai commented Mar 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Mar 18, 2026

Uh oh!

falcondai commented Mar 18, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2026

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

falcondai commented Mar 18, 2026 •

edited by cursor Bot

Loading