Skip to content

Multi Image GRPO#5197

Merged
danielhanchen merged 9 commits into
unslothai:mainfrom
Datta0:multi_image_grpo
May 13, 2026
Merged

Multi Image GRPO#5197
danielhanchen merged 9 commits into
unslothai:mainfrom
Datta0:multi_image_grpo

Conversation

@Datta0
Copy link
Copy Markdown
Collaborator

@Datta0 Datta0 commented Apr 27, 2026

Fixes: #5183
companion: unslothai/unsloth-zoo#613

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Datta0 Datta0 force-pushed the multi_image_grpo branch from 2846895 to 730d3b9 Compare April 27, 2026 04:21
@Datta0 Datta0 force-pushed the multi_image_grpo branch 2 times, most recently from 76d826a to 9c9b945 Compare April 27, 2026 16:50
@Datta0 Datta0 marked this pull request as ready for review May 1, 2026 10:56
@danielhanchen danielhanchen added auto-review-failed Auto-review rejected the PR and removed auto-review-failed Auto-review rejected the PR labels May 6, 2026
@danielhanchen danielhanchen added auto-review-failed Auto-review rejected the PR and removed auto-review-failed Auto-review rejected the PR labels May 6, 2026
@danielhanchen danielhanchen added the auto-addresses-issue Pre-flight: appears to address an open issue label May 6, 2026
@danielhanchen
Copy link
Copy Markdown
Member

This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:

  • unslothai/unsloth#5183@jaaabir — Issue reports multi-image GRPO vision mismatches; PR adds sample-aware multi-image slicing and num_images handling in GRPO replacements.
  • unslothai/unsloth#3605@backpropagator — Reports GRPO vision training failing when each example has multiple images; PR adds sample-aware multi-image GRPO vision input handling.
  • unslothai/unsloth#3357@Wu-Yuanfei — Issue concerns Qwen2.5-VL GRPO vision inputs; PR fixes multi-image vision GRPO batching/log-prob slicing and forwards num_images.

If this PR fixes any of them, consider adding closes #N / resolves #N to the description so the issue auto-closes on merge. If the match is wrong, ignore this comment.

@danielhanchen danielhanchen added the auto-reviewing Auto-review in progress label May 6, 2026
image_sizes is now sliced on the image axis (img_start:img_end) when
the processor emits one row per image and num_images is provided;
sample-axis slicing is kept as the fallback. This restores correct
per-batch image_sizes alignment for multi-image VLM processors.
pixel_attention_mask now uses a three-way layout check: image-axis
when shape[0] matches image_grid_thw rows, pixel-row when shape[0]
matches pixel_values rows and is distinct from total_samples,
otherwise sample-axis. Prevents misalignment with image-axis grid
slicing for per-image masks and ambiguity when single-image-per-sample
shapes coincide.
cum_imgs slice indices materialize via .item to match the existing
cum_rows pattern in the same loop and avoid 0-dim tensors flowing
into a CUDA-tensor slice.
cum_rows is materialized on CPU once after construction; the
per-chunk loop uses .item on it, so keeping it on device caused a
GPU->CPU sync per iteration.
Add a one-time fail-loud guard in compute_loss when num_images is
provided but the resolved grpo_accumulated_loss source has no
num_images handling, pointing users at the corresponding unsloth_zoo
upgrade. The active GRPO path goes through grpo_accumulated_loss
(the local _get_per_token_logps and _get_per_token_logps_and_entropies
return None on the efficient path), so without this guard a stale
unsloth_zoo silently mis-slices multi-image batches.
Only raise the zoo upgrade error when at least one entry in
num_images is not 1. Upstream TRL emits num_images=[1,1,...] for
any vision batch (one image per sample), and old unsloth_zoo
builds chunk those correctly because sample-axis and image-axis
slicing coincide for all-ones counts. Restricting the check to
batches with a real multi-image sample stops single-image VLM
GRPO from being needlessly broken on pre-companion zoo installs.
Prefer inspect.signature(grpo_accumulated_loss).parameters for
the num_images contract. Fall back to inspect.getsource string
matching only when the signature does not declare num_images
(e.g. the companion zoo wires it through **kwargs). The previous
try/except (TypeError, OSError) over getsource turned the guard
into a silent no-op when source files were absent; the new flow
raises in that case because the signature check will not have
proven support either.
@danielhanchen danielhanchen removed the auto-reviewing Auto-review in progress label May 6, 2026
@danielhanchen danielhanchen added the auto-approved Auto-review approved the PR label May 6, 2026
@danielhanchen
Copy link
Copy Markdown
Member

Auto-review verdict: Approved

Adds num_images-aware cumulative offsets so GRPO chunks the correct image_grid_thw / pixel_values / image_sizes / pixel_attention_mask slices for multi-image-per-sample VLM batches and forwards the new vision kwargs into grpo_accumulated_loss, fixing silently wrong logprobs on samples with more than one image.

Reason: Multi-image GRPO chunking is correct after fixes; review-added image-axis slicing, three-way pixel_attention_mask check, CPU-resident cum_rows, and scoped fail-loud zoo guard land all P1 concerns; no remaining real bugs.

@jaaabir
Copy link
Copy Markdown

jaaabir commented May 6, 2026

@danielhanchen i haven't tested the repo yet

@danielhanchen danielhanchen merged commit 98fde27 into unslothai:main May 13, 2026
9 checks passed
danielhanchen added a commit that referenced this pull request May 13, 2026
Conflict resolution for .github/workflows/release-desktop.yml.
main moved forward with PR #5394 (Chore(deps): bump the actions
group across 1 directory with 4 updates) which bumped action SHAs
on the build job's `actions/checkout` line, colliding with the
harden-runner audit step that this PR inserts above the checkout.

Resolution:

  - Keep the `step-security/harden-runner@<sha>  # v2.19.1` audit
    step at the head of the build job (this PR's contribution).
  - Accept main's newer `actions/checkout@de0fac2e4500...` SHA
    (was `34e114876b0b...`).

No functional change beyond the action SHA bump: harden-runner
still runs in audit mode (logs egress, never blocks), and
actions/checkout v6.0.2 is the dependabot-shipped upgrade from
v6.0.x.

Auto-merged cleanly:

  - .github/workflows/security-audit.yml
  - .github/workflows/studio-tauri-smoke.yml

plus eight non-workflow files from main (studio backend / tests /
unsloth GRPO changes from #5142, #5197, #5346, etc.). None touch
this PR's surface area.

Verified: pytest tests/security -> 34 passed in 2.71s; every
.github/workflows/*.yml parses cleanly under PyYAML (24 files).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-addresses-issue Pre-flight: appears to address an open issue auto-approved Auto-review approved the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

importing unsloth before trl on multi image grpo training gives mismatch error

3 participants