Multi Image GRPO by Datta0 · Pull Request #5197 · unslothai/unsloth

Datta0 · 2026-04-27T03:53:58Z

Fixes: #5183
companion: unslothai/unsloth-zoo#613

gemini-code-assist · 2026-04-27T03:54:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-06T12:36:56Z

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

unslothai/unsloth#5183 — @jaaabir — Issue reports multi-image GRPO vision mismatches; PR adds sample-aware multi-image slicing and num_images handling in GRPO replacements.
unslothai/unsloth#3605 — @backpropagator — Reports GRPO vision training failing when each example has multiple images; PR adds sample-aware multi-image GRPO vision input handling.
unslothai/unsloth#3357 — @Wu-Yuanfei — Issue concerns Qwen2.5-VL GRPO vision inputs; PR fixes multi-image vision GRPO batching/log-prob slicing and forwards num_images.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

image_sizes is now sliced on the image axis (img_start:img_end) when the processor emits one row per image and num_images is provided; sample-axis slicing is kept as the fallback. This restores correct per-batch image_sizes alignment for multi-image VLM processors. pixel_attention_mask now uses a three-way layout check: image-axis when shape[0] matches image_grid_thw rows, pixel-row when shape[0] matches pixel_values rows and is distinct from total_samples, otherwise sample-axis. Prevents misalignment with image-axis grid slicing for per-image masks and ambiguity when single-image-per-sample shapes coincide. cum_imgs slice indices materialize via .item to match the existing cum_rows pattern in the same loop and avoid 0-dim tensors flowing into a CUDA-tensor slice. cum_rows is materialized on CPU once after construction; the per-chunk loop uses .item on it, so keeping it on device caused a GPU->CPU sync per iteration. Add a one-time fail-loud guard in compute_loss when num_images is provided but the resolved grpo_accumulated_loss source has no num_images handling, pointing users at the corresponding unsloth_zoo upgrade. The active GRPO path goes through grpo_accumulated_loss (the local _get_per_token_logps and _get_per_token_logps_and_entropies return None on the efficient path), so without this guard a stale unsloth_zoo silently mis-slices multi-image batches.

Only raise the zoo upgrade error when at least one entry in num_images is not 1. Upstream TRL emits num_images=[1,1,...] for any vision batch (one image per sample), and old unsloth_zoo builds chunk those correctly because sample-axis and image-axis slicing coincide for all-ones counts. Restricting the check to batches with a real multi-image sample stops single-image VLM GRPO from being needlessly broken on pre-companion zoo installs. Prefer inspect.signature(grpo_accumulated_loss).parameters for the num_images contract. Fall back to inspect.getsource string matching only when the signature does not declare num_images (e.g. the companion zoo wires it through **kwargs). The previous try/except (TypeError, OSError) over getsource turned the guard into a silent no-op when source files were absent; the new flow raises in that case because the signature check will not have proven support either.

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-06T13:31:35Z

Auto-review verdict: Approved

Adds num_images-aware cumulative offsets so GRPO chunks the correct image_grid_thw / pixel_values / image_sizes / pixel_attention_mask slices for multi-image-per-sample VLM batches and forwards the new vision kwargs into grpo_accumulated_loss, fixing silently wrong logprobs on samples with more than one image.

Reason: Multi-image GRPO chunking is correct after fixes; review-added image-axis slicing, three-way pixel_attention_mask check, CPU-resident cum_rows, and scoped fail-loud zoo guard land all P1 concerns; no remaining real bugs.

jaaabir · 2026-05-06T13:36:11Z

@danielhanchen i haven't tested the repo yet

Conflict resolution for .github/workflows/release-desktop.yml. main moved forward with PR #5394 (Chore(deps): bump the actions group across 1 directory with 4 updates) which bumped action SHAs on the build job's `actions/checkout` line, colliding with the harden-runner audit step that this PR inserts above the checkout. Resolution: - Keep the `step-security/harden-runner@<sha> # v2.19.1` audit step at the head of the build job (this PR's contribution). - Accept main's newer `actions/checkout@de0fac2e4500...` SHA (was `34e114876b0b...`). No functional change beyond the action SHA bump: harden-runner still runs in audit mode (logs egress, never blocks), and actions/checkout v6.0.2 is the dependabot-shipped upgrade from v6.0.x. Auto-merged cleanly: - .github/workflows/security-audit.yml - .github/workflows/studio-tauri-smoke.yml plus eight non-workflow files from main (studio backend / tests / unsloth GRPO changes from #5142, #5197, #5346, etc.). None touch this PR's surface area. Verified: pytest tests/security -> 34 passed in 2.71s; every .github/workflows/*.yml parses cleanly under PyYAML (24 files).

Multi Image GRPO

5fc7e45

Datta0 mentioned this pull request Apr 27, 2026

Multi Image GRPO unslothai/unsloth-zoo#613

Merged

try matching trl semantics

730d3b9

Datta0 force-pushed the multi_image_grpo branch from 2846895 to 730d3b9 Compare April 27, 2026 04:21

attn mask for multi image grpo

9c9b945

Datta0 force-pushed the multi_image_grpo branch 2 times, most recently from 76d826a to 9c9b945 Compare April 27, 2026 16:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

da9ad13

for more information, see https://pre-commit.ci

Datta0 marked this pull request as ready for review May 1, 2026 10:56

Datta0 requested review from danielhanchen and pluesclues as code owners May 1, 2026 10:56

Merge branch 'main' into multi_image_grpo

6c76719

danielhanchen mentioned this pull request May 6, 2026

Multi Image GRPO shimmyshimmer/unsloth-staging-4#37

Open

danielhanchen added auto-review-failed Auto-review rejected the PR and removed auto-review-failed Auto-review rejected the PR labels May 6, 2026

danielhanchen mentioned this pull request May 6, 2026

Multi Image GRPO danielhanchen/unsloth-staging-2#106

Open

danielhanchen added auto-review-failed Auto-review rejected the PR and removed auto-review-failed Auto-review rejected the PR labels May 6, 2026

danielhanchen mentioned this pull request May 6, 2026

Multi Image GRPO Datta0/unsloth-staging-3#38

Closed

danielhanchen added the auto-addresses-issue Pre-flight: appears to address an open issue label May 6, 2026

danielhanchen added the auto-reviewing Auto-review in progress label May 6, 2026

danielhanchen added 3 commits May 6, 2026 13:29

Consolidate multi-image GRPO chunking and zoo guard tests

822bff9

danielhanchen requested a review from rolandtannous as a code owner May 6, 2026 13:30

danielhanchen mentioned this pull request May 6, 2026

[tests] Review tests for PR #5197 Datta0/unsloth-staging-3#39

Closed

[pre-commit.ci] auto fixes from pre-commit.com hooks

013b407

for more information, see https://pre-commit.ci

danielhanchen removed the auto-reviewing Auto-review in progress label May 6, 2026

danielhanchen added the auto-approved Auto-review approved the PR label May 6, 2026

danielhanchen merged commit 98fde27 into unslothai:main May 13, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi Image GRPO#5197

Multi Image GRPO#5197
danielhanchen merged 9 commits into
unslothai:mainfrom
Datta0:multi_image_grpo

Datta0 commented Apr 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 27, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

jaaabir commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Datta0 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Apr 27, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

danielhanchen commented May 6, 2026

Uh oh!

jaaabir commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Datta0 commented Apr 27, 2026 •

edited

Loading