Fix GRPOTrainer attribute access for vLLM model config#5302
Conversation
it should be `self.vllm_generation.llm...`
|
thanks, I think there is the same issue in RLOO, do you mind fixing it as well? |
|
rloo trainer seems free of this issue. I looked through the implementation and tested it with an example script One related issue I saw when searching "self.llm.llm_engine" through the repo is that online_dpo_trainer.py may benefit from a refactor to use VLLMGeneration. As it stands, it seems to handle colocate mode with similar (duplicate?) code as VLLMGeneration. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
albertvillanova
left a comment
There was a problem hiding this comment.
Thanks for the fix in GRPO and for validating RLOO. Very helpful!
In relation with OnlineDPOTrainer, that's a good observation. Just for context, OnlineDPOTrainer currently lives in the experimental submodule, so it hasn't been fully aligned with the abstractions used in the core trainers yet.
That said, I agree that this could benefit from a refactor to reuse VLLMGeneration, even if it is not a high priority right now.
it should be
self.vllm_generation.llm...What does this PR do?
Fixes #5301
Tested by running one of the example scripts.
python examples/scripts/grpo_agent.py \ --model_name_or_path Qwen/Qwen3-1.7B \ --output_dir grpo_biogrid_qwen_3g-1.7b \ --push_to_hub True \ --use_vllm True \ --vllm_mode colocate \ --max_completion_length 1024 \ --report_to trackio \ --log_completions True \ --max_steps 400Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Note
Low Risk
Low risk: a one-line attribute-chain fix used only to read
max_model_lenfor overlong prompt filtering in vLLM colocate mode.Overview
Fixes a broken attribute chain in
GRPOTrainer._tool_call_loopwhen running with vLLM incolocatemode by readingmax_model_lenfromself.vllm_generation.llm...instead of a non-existentself.llm.... This prevents failures when filtering overlong prompt+tool-call sequences before regeneration.Written by Cursor Bugbot for commit 5935d78. This will update automatically on new commits. Configure here.