Skip to content

[Bugfix] Fix delayed decoding bug for Bagel AR/DIT workflow (L3 test_bagel_img2img error)#2422

Merged
princepride merged 2 commits intovllm-project:mainfrom
natureofnature:bugfix/bagel/kv_transfer_opt
Apr 1, 2026
Merged

[Bugfix] Fix delayed decoding bug for Bagel AR/DIT workflow (L3 test_bagel_img2img error)#2422
princepride merged 2 commits intovllm-project:mainfrom
natureofnature:bugfix/bagel/kv_transfer_opt

Conversation

@natureofnature
Copy link
Copy Markdown
Contributor

@natureofnature natureofnature commented Apr 1, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

When stop_after_transfer is enabled (default), the AR scheduler previously continued decoding for 1–2 extra steps after the KV transfer trigger fired. The request was only stopped in a subsequent update_from_output call, after KV extraction completed. During those extra steps, the still-running parent request consumed scheduling budget, causing companion requests (e.g., cfg_text) to receive different chunked-prefill boundaries. This led to floating-point divergence in the KV cache and visibly degraded image quality in the DiT stage.

With this PR, the following modes should be supported.

prefill_finished special_token
stop_after_transfer: true (default) Supported — Stops decode immediately on trigger. waiting_for_transfer_free holds blocks until KV extraction completes. Orchestrator forwards to the next stage via the finished output path. Supported — Computes snapshot_len, then immediately stops decode with the same waiting_for_transfer_free protection. Fully aligned with prefill_finished semantics.
stop_after_transfer: false Supported — Continues decoding after trigger until natural termination (e.g. max_tokens). kv_ready signal is emitted once KV extraction completes (request still running), allowing the orchestrator to forward early. Supported — Continues decoding after the special token is detected. KV extraction triggers a kv_ready signal for early forwarding; the request finishes naturally and resources are freed on completion.

All four combinations are supported. The stop_after_transfer flag applies the same stop/continue semantics uniformly across both criteria types.

Test Plan

  1. Using prompt “Let the woman wear a blue dress”
  2. L3 test
VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_TEST_CLEAN_GPU_MEMORY=1 VLLM_IMAGE_FETCH_TIMEOUT=60 pytest -s -v tests/e2e/offline_inference/test_bagel_img2img.py -m "advanced_model" --run-level "advanced_model"

Test Result

Input output
image image
Screenshot from 2026-04-01 21-41-20
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)


@princepride

immediate stop after special tokens are triggered if set stop_after

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 980f1650e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm_omni/core/sched/omni_ar_scheduler.py
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@natureofnature
Copy link
Copy Markdown
Contributor Author

@codex review

@natureofnature
Copy link
Copy Markdown
Contributor Author

@amy-why-3459

@princepride princepride enabled auto-merge (squash) April 1, 2026 14:39
@princepride princepride added the ready label to trigger buildkite CI label Apr 1, 2026
Copy link
Copy Markdown
Collaborator

@princepride princepride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@princepride princepride merged commit bbae904 into vllm-project:main Apr 1, 2026
7 of 8 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…bagel_img2img error) (vllm-project#2422)

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants