[Feature] VLMs support for GRPO by danielhanchen · Pull Request #4265 · unslothai/unsloth

danielhanchen · 2026-03-12T22:03:32Z

Replacement for #2752 due to Studio rebasing

This replacement preserves the original authored change in repo history and immediately reverts it in the same PR so current main behavior stays unchanged.

This PR aims to add support for VLMs in GRPO, which is currently not supported by HF.

I've implemented a working version that does not yet include VLLM or video input support (mainly due to limited resources for testing video inputs haha).
I added a new variable, use_vision, to the GRPO config. Setting use_vision = True enables vision inputs, while use_vision = False keeps the default GRPO behavior. Default is False.
I also had to change a function in unsloth_zoo.peft_utils (requires_grad_post_hook) to make it work.
I've tested the implementation with Qwen 2.5 VL 7B for 250 steps, and training appears to proceed correctly (see TensorBoard screenshots for reference).

gemini-code-assist · 2026-03-12T22:03:36Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

This reverts commit 077fd59.

danielhanchen · 2026-03-13T03:41:13Z

@GAD-cell thanks so much again and apologies on the delay on the PR. We changed some of the design of components in Unsloth, so we had to rebase your PR. We accepted your contribution and appreciate it, although we had to change it quite a bit. Thanks again.

* Updated rl and rl_replacements * Revert "Updated rl and rl_replacements" This reverts commit 077fd59. --------- Co-authored-by: Sinoué GAD <85933501+GAD-cell@users.noreply.github.com>

GAD-cell and others added 2 commits March 12, 2026 22:23

Updated rl and rl_replacements

7ebec2a

Revert "Updated rl and rl_replacements"

f18186a

This reverts commit 077fd59.

danielhanchen force-pushed the dh/recover-2752-vlm-grpo-credit branch from e8214b2 to f18186a Compare March 12, 2026 22:23

danielhanchen merged commit 1ca441a into main Mar 12, 2026
5 checks passed

danielhanchen deleted the dh/recover-2752-vlm-grpo-credit branch March 12, 2026 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] VLMs support for GRPO#4265

[Feature] VLMs support for GRPO#4265
danielhanchen merged 2 commits intomainfrom
dh/recover-2752-vlm-grpo-credit

danielhanchen commented Mar 12, 2026

Uh oh!

gemini-code-assist bot commented Mar 12, 2026

Uh oh!

Uh oh!

danielhanchen commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danielhanchen commented Mar 12, 2026

Uh oh!

gemini-code-assist bot commented Mar 12, 2026

Uh oh!

Uh oh!

danielhanchen commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants