Skip to content

Fix GRPOTrainer attribute access for vLLM model config#5302

Merged
qgallouedec merged 1 commit into
huggingface:mainfrom
falcondai:patch-1
Mar 19, 2026
Merged

Fix GRPOTrainer attribute access for vLLM model config#5302
qgallouedec merged 1 commit into
huggingface:mainfrom
falcondai:patch-1

Conversation

@falcondai

@falcondai falcondai commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

it should be self.vllm_generation.llm...

What does this PR do?

Fixes #5301

Tested by running one of the example scripts.

python examples/scripts/grpo_agent.py \
    --model_name_or_path Qwen/Qwen3-1.7B \
    --output_dir grpo_biogrid_qwen_3g-1.7b \
    --push_to_hub True \
    --use_vllm True \
    --vllm_mode colocate \
    --max_completion_length 1024 \
    --report_to trackio \
    --log_completions True \
    --max_steps 400

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Low Risk
Low risk: a one-line attribute-chain fix used only to read max_model_len for overlong prompt filtering in vLLM colocate mode.

Overview
Fixes a broken attribute chain in GRPOTrainer._tool_call_loop when running with vLLM in colocate mode by reading max_model_len from self.vllm_generation.llm... instead of a non-existent self.llm.... This prevents failures when filtering overlong prompt+tool-call sequences before regeneration.

Written by Cursor Bugbot for commit 5935d78. This will update automatically on new commits. Configure here.

it should be `self.vllm_generation.llm...`
@qgallouedec

Copy link
Copy Markdown
Member

thanks, I think there is the same issue in RLOO, do you mind fixing it as well?

@falcondai

Copy link
Copy Markdown
Contributor Author

rloo trainer seems free of this issue. I looked through the implementation and tested it with an example script python examples/scripts/rloo.py --use-vllm True --vllm-mode colocate.

One related issue I saw when searching "self.llm.llm_engine" through the repo is that online_dpo_trainer.py may benefit from a refactor to use VLLMGeneration. As it stands, it seems to handle colocate mode with similar (duplicate?) code as VLLMGeneration.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@albertvillanova albertvillanova left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix in GRPO and for validating RLOO. Very helpful!

In relation with OnlineDPOTrainer, that's a good observation. Just for context, OnlineDPOTrainer currently lives in the experimental submodule, so it hasn't been fully aligned with the abstractions used in the core trainers yet.

That said, I agree that this could benefit from a refactor to reuse VLLMGeneration, even if it is not a high priority right now.

@albertvillanova albertvillanova changed the title Fix grpo trainer attribute chain Fix GRPOTrainer attribute access for vLLM model config Mar 18, 2026
@qgallouedec qgallouedec merged commit 116fc42 into huggingface:main Mar 19, 2026
12 checks passed
@falcondai falcondai deleted the patch-1 branch March 19, 2026 03:37
songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GRPO + vLLM colocate mode crashes due to missing attribute

4 participants