[Bugfix] Fix delayed decoding bug for Bagel AR/DIT workflow (L3 test_bagel_img2img error)#2422
Merged
princepride merged 2 commits intovllm-project:mainfrom Apr 1, 2026
Conversation
immediate stop after special tokens are triggered if set stop_after Signed-off-by: natureofnature <wzliu@connect.hku.hk>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 980f1650e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Contributor
Author
|
@codex review |
Contributor
Author
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
vraiti
pushed a commit
to vraiti/vllm-omni
that referenced
this pull request
Apr 9, 2026
…bagel_img2img error) (vllm-project#2422) Signed-off-by: natureofnature <wzliu@connect.hku.hk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
When stop_after_transfer is enabled (default), the AR scheduler previously continued decoding for 1–2 extra steps after the KV transfer trigger fired. The request was only stopped in a subsequent update_from_output call, after KV extraction completed. During those extra steps, the still-running parent request consumed scheduling budget, causing companion requests (e.g., cfg_text) to receive different chunked-prefill boundaries. This led to floating-point divergence in the KV cache and visibly degraded image quality in the DiT stage.
With this PR, the following modes should be supported.
prefill_finishedspecial_tokenstop_after_transfer: true(default)waiting_for_transfer_freeholds blocks until KV extraction completes. Orchestrator forwards to the next stage via thefinishedoutput path.snapshot_len, then immediately stops decode with the samewaiting_for_transfer_freeprotection. Fully aligned withprefill_finishedsemantics.stop_after_transfer: falsemax_tokens).kv_readysignal is emitted once KV extraction completes (request still running), allowing the orchestrator to forward early.kv_readysignal for early forwarding; the request finishes naturally and resources are freed on completion.All four combinations are supported. The
stop_after_transferflag applies the same stop/continue semantics uniformly across both criteria types.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)
@princepride