Skip to content

[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.#2458

Merged
princepride merged 6 commits into
vllm-project:mainfrom
natureofnature:bugfix/bagel/serve_error
Apr 13, 2026
Merged

[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression.#2458
princepride merged 6 commits into
vllm-project:mainfrom
natureofnature:bugfix/bagel/serve_error

Conversation

@natureofnature
Copy link
Copy Markdown
Contributor

@natureofnature natureofnature commented Apr 2, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Solve the following two issues

  1. Hang after several requests in online mode
    Symptom: The engine hangs on the Nth request (e.g., 5th) and never completes.
    Root cause: When execute_model() returns early (via the num_scheduled_tokens == 0 path), self.kv_extracted_req_ids is not attached to the output because sample_tokens() is skipped. The scheduler never receives the extraction ack, so blocks in waiting_for_transfer_free are never freed, eventually exhausting block capacity.

  2. Non-deterministic image quality regression in online mode
    Symptom: The first non-warmup request produces a visibly degraded image; subsequent requests stabilize.
    Root cause: stop_after_transfer immediately sets the request to FINISHED_STOPPED before KV extraction completes. This suppresses the kv_ready signal (since is_finished() is already true), forcing the orchestrator onto a different forwarding path with altered companion scheduling timing, which introduces floating-point divergence in the CFG KV caches.

  3. Strange and distorted outputs after the first step

Symptom: The first request (warm_0) produces a correct image; all subsequent requests (warm_1, iter_0, iter_1, ...) produce visibly degraded/distorted images with wrong colors, broken details, and loss of scene fidelity. The degradation is consistent and reproducible across all post-warmup iterations.

Root cause: When parent and cfg_text companion are co-scheduled in the same batch — the common case after engine warm-up — the encoder_cache_manager deduplicates their encoder runs because they share the same content-based mm_hash (same source image). Only the parent's embed_multimodal_process_img2img_input executes, appending one entry to _pending_img2img_info. However, the batch contains two img2img requests (parent + cfg_text). In _adjust_positions_for_img2img, the single info entry is consumed by the parent, leaving cfg_text with sequential position IDs (0, 1, 2, 3, ...) instead of the required img2img scheme (VAE→0, ViT→1, text→2, 3, ...), and without the gen-mode VAE token mask for MoT routing. This causes cfg_text's transformer forward pass to use completely wrong position encodings and standard-mode weights for VAE latent patches, producing an entirely different KV cache. The DiT stage receives corrupted CFG conditioning, resulting in severe image quality degradation.

The first request (warm_0) is unaffected because engine startup latency causes parent and companions to arrive in separate scheduler steps, where the original _cfg_companion_queue fallback correctly handles the companion. Once the engine is warm, all three requests (parent, cfg_text, cfg_img) land in a single scheduler step, triggering the encoder cache deduplication that exposes the bug.

Query index warm up 1 st 2nd 3rd 4th 5th 6th 7th 8th
input image \ prompt black (warm up) black white blue yellow green purple pink orange
image image image image image image image image image image

Test Plan

  1. L3 Test
FLASHINFER_DISABLE_VERSION_CHECK=1 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_TEST_CLEAN_GPU_MEMORY=1 VLLM_IMAGE_FETCH_TIMEOUT=60  pytest tests/e2e/offline_inference/test_bagel_img2img.py tests/e2e/offline_inference/test_bagel_text2img.py tests/e2e/online_serving/test_bagel_online.py tests/e2e/online_serving/test_bagel_expansion.py -v -m "advanced_model" --run-level advanced_model
  1. Online serve

Test Result

Screenshot from 2026-04-10 12-18-27 1. Offline L3 CI test Screenshot from 2026-04-02 22-00-50 2. Online L3 CI test Screenshot from 2026-04-04 00-15-06 4. Consecutive requests
Query index 1 st 2nd 3rd 4th 5th 6th 7th 8th
input image \ prompt black white blue yellow green purple pink orange
image image image image image image image image image

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@natureofnature natureofnature changed the title [WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests in online mode. 2. Non-deterministic image quality regression. [WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. Apr 2, 2026
@natureofnature
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ddfdc2f15d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm_omni/core/sched/omni_ar_scheduler.py
@natureofnature
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@natureofnature
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5e5c106f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +716 to +718
elif self._last_img2img_info is not None:
cur_info = self._last_img2img_info
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve per-request img2img metadata ordering

Falling back to self._last_img2img_info when img2img_idx exceeds info_list applies the latest image metadata to unrelated request segments in the same batch (for example, mixed img2img+text batches or concurrent img2img requests with different image sizes). In _adjust_positions_for_img2img, that can incorrectly satisfy req_len >= num_img2img and rewrite positions/rope metadata for the wrong request, producing invalid KV-transfer metadata and distorted generation; the previous behavior correctly stopped applying img2img remapping once info_list was exhausted.

Useful? React with 👍 / 👎.

@natureofnature natureofnature force-pushed the bugfix/bagel/serve_error branch from a5e5c10 to f9d789d Compare April 4, 2026 04:24
@natureofnature natureofnature changed the title [WIP][Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. [Bugfix] Fix Bagel online mode for 1. Hang after several requests 2. Non-deterministic image quality regression. Apr 4, 2026
@natureofnature
Copy link
Copy Markdown
Contributor Author

@codex reivew

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9d789d518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +762 to +764
elif self._last_img2img_info is not None:
cur_info = self._last_img2img_info
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve per-request img2img metadata isolation

Using a single _last_img2img_info fallback here makes position rewriting depend on the most recently processed image, not the current request. After info_list is consumed, later request boundaries in the same batch (or concurrent requests with different image sizes) will reuse stale num_vae/num_vit/image_shape, so non-matching requests get incorrect rope positions and VAE masks. This can mis-route tokens through the img2img MoT path and produce corrupted outputs; metadata needs to stay keyed/queued per request instead of falling back globally.

Useful? React with 👍 / 👎.

@natureofnature
Copy link
Copy Markdown
Contributor Author

natureofnature commented Apr 5, 2026

@princepride PTAL

…tic image quality regression in online mode.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
When gen + cfg_text + cfg_img companions are co-scheduled in the same
batch, the encoder cache deduplicates their runs (same mm_hash).
Only the gen request's _process_img2img_input executes, producing 1
entry in info_list, but _adjust_positions_for_img2img iterates over
all 3 requests in the batch. cfg_text and cfg_img miss position
adjustment and MoT routing, producing corrupted KV cache.

Add a fallback in _adjust_positions_for_img2img: when info_list is
exhausted, reuse info_list[-1] for remaining same-image companions.
Only increment img2img_idx when consuming a real info_list entry.

Note: the existing _cfg_companion_queue in forward() only handles the
case where companions arrive in separate forward() calls; it cannot
cover the same-batch case because _pending_img2img_info is non-empty
and forward() skips the companion queue branch entirely.

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
…verification

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@natureofnature natureofnature force-pushed the bugfix/bagel/serve_error branch from f9d789d to 8d78dca Compare April 10, 2026 03:08
@princepride princepride enabled auto-merge (squash) April 13, 2026 02:40
@princepride princepride added the ready label to trigger buildkite CI label Apr 13, 2026
@princepride princepride merged commit e122501 into vllm-project:main Apr 13, 2026
8 checks passed
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
…Non-deterministic image quality regression. (vllm-project#2458)

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
…Non-deterministic image quality regression. (vllm-project#2458)

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…Non-deterministic image quality regression. (vllm-project#2458)

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants