[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.#2458
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddfdc2f15d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5e5c106f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif self._last_img2img_info is not None: | ||
| cur_info = self._last_img2img_info | ||
| else: |
There was a problem hiding this comment.
Preserve per-request img2img metadata ordering
Falling back to self._last_img2img_info when img2img_idx exceeds info_list applies the latest image metadata to unrelated request segments in the same batch (for example, mixed img2img+text batches or concurrent img2img requests with different image sizes). In _adjust_positions_for_img2img, that can incorrectly satisfy req_len >= num_img2img and rewrite positions/rope metadata for the wrong request, producing invalid KV-transfer metadata and distorted generation; the previous behavior correctly stopped applying img2img remapping once info_list was exhausted.
Useful? React with 👍 / 👎.
a5e5c10 to
f9d789d
Compare
|
@codex reivew |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9d789d518
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif self._last_img2img_info is not None: | ||
| cur_info = self._last_img2img_info | ||
| else: |
There was a problem hiding this comment.
Preserve per-request img2img metadata isolation
Using a single _last_img2img_info fallback here makes position rewriting depend on the most recently processed image, not the current request. After info_list is consumed, later request boundaries in the same batch (or concurrent requests with different image sizes) will reuse stale num_vae/num_vit/image_shape, so non-matching requests get incorrect rope positions and VAE masks. This can mis-route tokens through the img2img MoT path and produce corrupted outputs; metadata needs to stay keyed/queued per request instead of falling back globally.
Useful? React with 👍 / 👎.
|
@princepride PTAL |
…tic image quality regression in online mode. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
When gen + cfg_text + cfg_img companions are co-scheduled in the same batch, the encoder cache deduplicates their runs (same mm_hash). Only the gen request's _process_img2img_input executes, producing 1 entry in info_list, but _adjust_positions_for_img2img iterates over all 3 requests in the batch. cfg_text and cfg_img miss position adjustment and MoT routing, producing corrupted KV cache. Add a fallback in _adjust_positions_for_img2img: when info_list is exhausted, reuse info_list[-1] for remaining same-image companions. Only increment img2img_idx when consuming a real info_list entry. Note: the existing _cfg_companion_queue in forward() only handles the case where companions arrive in separate forward() calls; it cannot cover the same-batch case because _pending_img2img_info is non-empty and forward() skips the companion queue branch entirely. Signed-off-by: natureofnature <wzliu@connect.hku.hk>
…verification Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
f9d789d to
8d78dca
Compare
…Non-deterministic image quality regression. (vllm-project#2458) Signed-off-by: natureofnature <wzliu@connect.hku.hk>
…Non-deterministic image quality regression. (vllm-project#2458) Signed-off-by: natureofnature <wzliu@connect.hku.hk>
…Non-deterministic image quality regression. (vllm-project#2458) Signed-off-by: natureofnature <wzliu@connect.hku.hk>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Solve the following two issues
Hang after several requests in online mode
Symptom: The engine hangs on the Nth request (e.g., 5th) and never completes.
Root cause: When execute_model() returns early (via the num_scheduled_tokens == 0 path), self.kv_extracted_req_ids is not attached to the output because sample_tokens() is skipped. The scheduler never receives the extraction ack, so blocks in waiting_for_transfer_free are never freed, eventually exhausting block capacity.
Non-deterministic image quality regression in online mode
Symptom: The first non-warmup request produces a visibly degraded image; subsequent requests stabilize.
Root cause: stop_after_transfer immediately sets the request to FINISHED_STOPPED before KV extraction completes. This suppresses the kv_ready signal (since is_finished() is already true), forcing the orchestrator onto a different forwarding path with altered companion scheduling timing, which introduces floating-point divergence in the CFG KV caches.
Strange and distorted outputs after the first step
Symptom: The first request (warm_0) produces a correct image; all subsequent requests (warm_1, iter_0, iter_1, ...) produce visibly degraded/distorted images with wrong colors, broken details, and loss of scene fidelity. The degradation is consistent and reproducible across all post-warmup iterations.
Root cause: When parent and cfg_text companion are co-scheduled in the same batch — the common case after engine warm-up — the
encoder_cache_managerdeduplicates their encoder runs because they share the same content-basedmm_hash(same source image). Only the parent'sembed_multimodal→_process_img2img_inputexecutes, appending one entry to_pending_img2img_info. However, the batch contains two img2img requests (parent + cfg_text). In_adjust_positions_for_img2img, the single info entry is consumed by the parent, leaving cfg_text with sequential position IDs (0, 1, 2, 3, ...) instead of the required img2img scheme (VAE→0, ViT→1, text→2, 3, ...), and without the gen-mode VAE token mask for MoT routing. This causes cfg_text's transformer forward pass to use completely wrong position encodings and standard-mode weights for VAE latent patches, producing an entirely different KV cache. The DiT stage receives corrupted CFG conditioning, resulting in severe image quality degradation.The first request (warm_0) is unaffected because engine startup latency causes parent and companions to arrive in separate scheduler steps, where the original
_cfg_companion_queuefallback correctly handles the companion. Once the engine is warm, all three requests (parent, cfg_text, cfg_img) land in a single scheduler step, triggering the encoder cache deduplication that exposes the bug.Test Plan
FLASHINFER_DISABLE_VERSION_CHECK=1 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_TEST_CLEAN_GPU_MEMORY=1 VLLM_IMAGE_FETCH_TIMEOUT=60 pytest tests/e2e/offline_inference/test_bagel_img2img.py tests/e2e/offline_inference/test_bagel_text2img.py tests/e2e/online_serving/test_bagel_online.py tests/e2e/online_serving/test_bagel_expansion.py -v -m "advanced_model" --run-level advanced_modelTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)