[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. by natureofnature · Pull Request #2458 · vllm-project/vllm-omni

natureofnature · 2026-04-02T13:59:51Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Solve the following two issues

Hang after several requests in online mode
Symptom: The engine hangs on the Nth request (e.g., 5th) and never completes.
Root cause: When execute_model() returns early (via the num_scheduled_tokens == 0 path), self.kv_extracted_req_ids is not attached to the output because sample_tokens() is skipped. The scheduler never receives the extraction ack, so blocks in waiting_for_transfer_free are never freed, eventually exhausting block capacity.
Non-deterministic image quality regression in online mode
Symptom: The first non-warmup request produces a visibly degraded image; subsequent requests stabilize.
Root cause: stop_after_transfer immediately sets the request to FINISHED_STOPPED before KV extraction completes. This suppresses the kv_ready signal (since is_finished() is already true), forcing the orchestrator onto a different forwarding path with altered companion scheduling timing, which introduces floating-point divergence in the CFG KV caches.
Strange and distorted outputs after the first step

Symptom: The first request (warm_0) produces a correct image; all subsequent requests (warm_1, iter_0, iter_1, ...) produce visibly degraded/distorted images with wrong colors, broken details, and loss of scene fidelity. The degradation is consistent and reproducible across all post-warmup iterations.

Root cause: When parent and cfg_text companion are co-scheduled in the same batch — the common case after engine warm-up — the encoder_cache_manager deduplicates their encoder runs because they share the same content-based mm_hash (same source image). Only the parent's embed_multimodal → _process_img2img_input executes, appending one entry to _pending_img2img_info. However, the batch contains two img2img requests (parent + cfg_text). In _adjust_positions_for_img2img, the single info entry is consumed by the parent, leaving cfg_text with sequential position IDs (0, 1, 2, 3, ...) instead of the required img2img scheme (VAE→0, ViT→1, text→2, 3, ...), and without the gen-mode VAE token mask for MoT routing. This causes cfg_text's transformer forward pass to use completely wrong position encodings and standard-mode weights for VAE latent patches, producing an entirely different KV cache. The DiT stage receives corrupted CFG conditioning, resulting in severe image quality degradation.

The first request (warm_0) is unaffected because engine startup latency causes parent and companions to arrive in separate scheduler steps, where the original _cfg_companion_queue fallback correctly handles the companion. Once the engine is warm, all three requests (parent, cfg_text, cfg_img) land in a single scheduler step, triggering the encoder cache deduplication that exposes the bug.

Query index	warm up	1 st	2nd	3rd	4th	5th	6th	7th	8th
input image \ prompt	black (warm up)	black	white	blue	yellow	green	purple	pink	orange

Test Plan

L3 Test

FLASHINFER_DISABLE_VERSION_CHECK=1 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_TEST_CLEAN_GPU_MEMORY=1 VLLM_IMAGE_FETCH_TIMEOUT=60  pytest tests/e2e/offline_inference/test_bagel_img2img.py tests/e2e/offline_inference/test_bagel_text2img.py tests/e2e/online_serving/test_bagel_online.py tests/e2e/online_serving/test_bagel_expansion.py -v -m "advanced_model" --run-level advanced_model

Online serve

Test Result

1. Offline L3 CI test

2. Online L3 CI test

4. Consecutive requests

Query index	1 st	2nd	3rd	4th	5th	6th	7th	8th
input image \ prompt	black	white	blue	yellow	green	purple	pink	orange

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

natureofnature · 2026-04-02T14:02:14Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ddfdc2f15d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

natureofnature · 2026-04-02T14:13:45Z

@codex review

chatgpt-codex-connector · 2026-04-02T14:19:27Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

natureofnature · 2026-04-03T16:28:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5e5c106f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T16:31:51Z

+            elif self._last_img2img_info is not None:
+                cur_info = self._last_img2img_info
+            else:


Preserve per-request img2img metadata ordering

Falling back to self._last_img2img_info when img2img_idx exceeds info_list applies the latest image metadata to unrelated request segments in the same batch (for example, mixed img2img+text batches or concurrent img2img requests with different image sizes). In _adjust_positions_for_img2img, that can incorrectly satisfy req_len >= num_img2img and rewrite positions/rope metadata for the wrong request, producing invalid KV-transfer metadata and distorted generation; the previous behavior correctly stopped applying img2img remapping once info_list was exhausted.

Useful? React with 👍 / 👎.

natureofnature · 2026-04-04T11:56:51Z

@codex reivew

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9d789d518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T12:03:02Z

+            elif self._last_img2img_info is not None:
+                cur_info = self._last_img2img_info
+            else:


Preserve per-request img2img metadata isolation

Using a single _last_img2img_info fallback here makes position rewriting depend on the most recently processed image, not the current request. After info_list is consumed, later request boundaries in the same batch (or concurrent requests with different image sizes) will reuse stale num_vae/num_vit/image_shape, so non-matching requests get incorrect rope positions and VAE masks. This can mis-route tokens through the img2img MoT path and produce corrupted outputs; metadata needs to stay keyed/queued per request instead of falling back globally.

Useful? React with 👍 / 👎.

natureofnature · 2026-04-05T04:16:11Z

@princepride PTAL

…tic image quality regression in online mode. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

When gen + cfg_text + cfg_img companions are co-scheduled in the same batch, the encoder cache deduplicates their runs (same mm_hash). Only the gen request's _process_img2img_input executes, producing 1 entry in info_list, but _adjust_positions_for_img2img iterates over all 3 requests in the batch. cfg_text and cfg_img miss position adjustment and MoT routing, producing corrupted KV cache. Add a fallback in _adjust_positions_for_img2img: when info_list is exhausted, reuse info_list[-1] for remaining same-image companions. Only increment img2img_idx when consuming a real info_list entry. Note: the existing _cfg_companion_queue in forward() only handles the case where companions arrive in separate forward() calls; it cannot cover the same-batch case because _pending_img2img_info is non-empty and forward() skips the companion queue branch entirely. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

…verification Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

…Non-deterministic image quality regression. (vllm-project#2458) Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature requested a review from hsliuustc0106 as a code owner April 2, 2026 13:59

natureofnature changed the title ~~[WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests in online mode. 2. Non-deterministic image quality regression.~~ [WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. Apr 2, 2026

chatgpt-codex-connector Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread vllm_omni/core/sched/omni_ar_scheduler.py

chatgpt-codex-connector Bot reviewed Apr 3, 2026

View reviewed changes

natureofnature force-pushed the bugfix/bagel/serve_error branch from a5e5c10 to f9d789d Compare April 4, 2026 04:24

natureofnature changed the title ~~[WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.~~ [Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. Apr 4, 2026

chatgpt-codex-connector Bot reviewed Apr 4, 2026

View reviewed changes

natureofnature added 6 commits April 10, 2026 03:07

Fix: 1. Hang after several requests in online mode. 2. Non-determinis…

0111cf1

…tic image quality regression in online mode. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

clear pending_stop_after_extraction

57e538b

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

update npu bagel extracted request ids

09dc2ac

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

temp fix image distortion with main after rebase, thinking mode need …

8cc6e75

…verification Signed-off-by: natureofnature <wzliu@connect.hku.hk>

works for both think/normal mode

8d78dca

Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature force-pushed the bugfix/bagel/serve_error branch from f9d789d to 8d78dca Compare April 10, 2026 03:08

princepride enabled auto-merge (squash) April 13, 2026 02:40

princepride approved these changes Apr 13, 2026

View reviewed changes

princepride added the ready label to trigger buildkite CI label Apr 13, 2026

princepride merged commit e122501 into vllm-project:main Apr 13, 2026
8 checks passed

natureofnature mentioned this pull request Apr 13, 2026

[RFC]: Support Bagel using mooncakeTransferEngineConnector JiusiServe/vllm-omni#145

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.#2458

[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.#2458
princepride merged 6 commits into
vllm-project:mainfrom
natureofnature:bugfix/bagel/serve_error

natureofnature commented Apr 2, 2026 •

edited

Loading

Uh oh!

natureofnature commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

natureofnature commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 2, 2026

Uh oh!

natureofnature commented Apr 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Uh oh!

natureofnature commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Uh oh!

natureofnature commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

natureofnature commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

natureofnature commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

natureofnature commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 2, 2026

Uh oh!

natureofnature commented Apr 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

natureofnature commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

natureofnature commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

natureofnature commented Apr 2, 2026 •

edited

Loading

natureofnature commented Apr 5, 2026 •

edited

Loading