Skip to content

[Feature] VLMs support for GRPO#4265

Merged
danielhanchen merged 2 commits intomainfrom
dh/recover-2752-vlm-grpo-credit
Mar 12, 2026
Merged

[Feature] VLMs support for GRPO#4265
danielhanchen merged 2 commits intomainfrom
dh/recover-2752-vlm-grpo-credit

Conversation

@danielhanchen
Copy link
Copy Markdown
Contributor

Replacement for #2752 due to Studio rebasing

This replacement preserves the original authored change in repo history and immediately reverts it in the same PR so current main behavior stays unchanged.

This PR aims to add support for VLMs in GRPO, which is currently not supported by HF.

I've implemented a working version that does not yet include VLLM or video input support (mainly due to limited resources for testing video inputs haha).
I added a new variable, use_vision, to the GRPO config. Setting use_vision = True enables vision inputs, while use_vision = False keeps the default GRPO behavior. Default is False.
I also had to change a function in unsloth_zoo.peft_utils (requires_grad_post_hook) to make it work.
I've tested the implementation with Qwen 2.5 VL 7B for 250 steps, and training appears to proceed correctly (see TensorBoard screenshots for reference).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@danielhanchen danielhanchen force-pushed the dh/recover-2752-vlm-grpo-credit branch from e8214b2 to f18186a Compare March 12, 2026 22:23
@danielhanchen danielhanchen merged commit 1ca441a into main Mar 12, 2026
5 checks passed
@danielhanchen danielhanchen deleted the dh/recover-2752-vlm-grpo-credit branch March 12, 2026 23:09
@danielhanchen
Copy link
Copy Markdown
Contributor Author

@GAD-cell thanks so much again and apologies on the delay on the PR. We changed some of the design of components in Unsloth, so we had to rebase your PR. We accepted your contribution and appreciate it, although we had to change it quite a bit. Thanks again.

shibizhao pushed a commit to shibizhao/unsloth-npu that referenced this pull request Apr 7, 2026
* Updated rl and rl_replacements

* Revert "Updated rl and rl_replacements"

This reverts commit 077fd59.

---------

Co-authored-by: Sinoué GAD <85933501+GAD-cell@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants