Multi Image GRPO#5197
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
2846895 to
730d3b9
Compare
76d826a to
9c9b945
Compare
for more information, see https://pre-commit.ci
|
This PR appears to address open issue(s). The duplicate detector matched the following open issues with HIGH confidence:
If this PR fixes any of them, consider adding |
image_sizes is now sliced on the image axis (img_start:img_end) when the processor emits one row per image and num_images is provided; sample-axis slicing is kept as the fallback. This restores correct per-batch image_sizes alignment for multi-image VLM processors. pixel_attention_mask now uses a three-way layout check: image-axis when shape[0] matches image_grid_thw rows, pixel-row when shape[0] matches pixel_values rows and is distinct from total_samples, otherwise sample-axis. Prevents misalignment with image-axis grid slicing for per-image masks and ambiguity when single-image-per-sample shapes coincide. cum_imgs slice indices materialize via .item to match the existing cum_rows pattern in the same loop and avoid 0-dim tensors flowing into a CUDA-tensor slice. cum_rows is materialized on CPU once after construction; the per-chunk loop uses .item on it, so keeping it on device caused a GPU->CPU sync per iteration. Add a one-time fail-loud guard in compute_loss when num_images is provided but the resolved grpo_accumulated_loss source has no num_images handling, pointing users at the corresponding unsloth_zoo upgrade. The active GRPO path goes through grpo_accumulated_loss (the local _get_per_token_logps and _get_per_token_logps_and_entropies return None on the efficient path), so without this guard a stale unsloth_zoo silently mis-slices multi-image batches.
Only raise the zoo upgrade error when at least one entry in num_images is not 1. Upstream TRL emits num_images=[1,1,...] for any vision batch (one image per sample), and old unsloth_zoo builds chunk those correctly because sample-axis and image-axis slicing coincide for all-ones counts. Restricting the check to batches with a real multi-image sample stops single-image VLM GRPO from being needlessly broken on pre-companion zoo installs. Prefer inspect.signature(grpo_accumulated_loss).parameters for the num_images contract. Fall back to inspect.getsource string matching only when the signature does not declare num_images (e.g. the companion zoo wires it through **kwargs). The previous try/except (TypeError, OSError) over getsource turned the guard into a silent no-op when source files were absent; the new flow raises in that case because the signature check will not have proven support either.
for more information, see https://pre-commit.ci
|
Auto-review verdict: Approved Adds num_images-aware cumulative offsets so GRPO chunks the correct image_grid_thw / pixel_values / image_sizes / pixel_attention_mask slices for multi-image-per-sample VLM batches and forwards the new vision kwargs into grpo_accumulated_loss, fixing silently wrong logprobs on samples with more than one image. Reason: Multi-image GRPO chunking is correct after fixes; review-added image-axis slicing, three-way pixel_attention_mask check, CPU-resident cum_rows, and scoped fail-loud zoo guard land all P1 concerns; no remaining real bugs. |
|
@danielhanchen i haven't tested the repo yet |
Conflict resolution for .github/workflows/release-desktop.yml. main moved forward with PR #5394 (Chore(deps): bump the actions group across 1 directory with 4 updates) which bumped action SHAs on the build job's `actions/checkout` line, colliding with the harden-runner audit step that this PR inserts above the checkout. Resolution: - Keep the `step-security/harden-runner@<sha> # v2.19.1` audit step at the head of the build job (this PR's contribution). - Accept main's newer `actions/checkout@de0fac2e4500...` SHA (was `34e114876b0b...`). No functional change beyond the action SHA bump: harden-runner still runs in audit mode (logs egress, never blocks), and actions/checkout v6.0.2 is the dependabot-shipped upgrade from v6.0.x. Auto-merged cleanly: - .github/workflows/security-audit.yml - .github/workflows/studio-tauri-smoke.yml plus eight non-workflow files from main (studio backend / tests / unsloth GRPO changes from #5142, #5197, #5346, etc.). None touch this PR's surface area. Verified: pytest tests/security -> 34 passed in 2.71s; every .github/workflows/*.yml parses cleanly under PyYAML (24 files).
Fixes: #5183
companion: unslothai/unsloth-zoo#613