Skip to content

[BugFix] Continue decode if don't need transfer kv cache between two …#2502

Merged
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
princepride:continue-decode-if-donnot-transfer-kv
Apr 7, 2026
Merged

[BugFix] Continue decode if don't need transfer kv cache between two …#2502
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
princepride:continue-decode-if-donnot-transfer-kv

Conversation

@princepride
Copy link
Copy Markdown
Collaborator

@princepride princepride commented Apr 5, 2026

…stages

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

While the previous update resolved the synchronization error in KV cache transfers across stages, it introduced a side effect where Stage 0 output is terminated unconditionally, even when no transmission is occurring and img2text and text2text task can not output any text.

Test Plan

python3 examples/offline_inference/bagel/end2end.py   --modality text2text   --prompts "Where is the capital of France?"

Test Result

** before: **


** after: **

The capital of France is Paris.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…stages

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride
Copy link
Copy Markdown
Collaborator Author

@natureofnature @lishunyang12 PTAL

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6ec96f30a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/core/sched/omni_ar_scheduler.py Outdated
Signed-off-by: princepride <wangzhipeng628@gmail.com>
…tion

deserialize_additional_information() reconstructs all entries (including
tensors from bytes) on every call. Since _request_omits_kv_transfer_to_next_stage
is invoked on each scheduler tick, this caused unnecessary CPU copies and
memory churn during decode. Cache the boolean per request and clean up
in _free_request.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Made-with: Cursor
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has regression tests (test_bagel_understanding.py). Logic looks correct.

@princepride princepride added the ready label to trigger buildkite CI label Apr 6, 2026
@princepride
Copy link
Copy Markdown
Collaborator Author

@hsliuustc0106 Can you help approve it?

@hsliuustc0106 hsliuustc0106 merged commit 8dd66ce into vllm-project:main Apr 7, 2026
8 checks passed
skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
bob-021206 pushed a commit to jasonlee-1024/vllm-omni that referenced this pull request Apr 21, 2026
vllm-project#2502)

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: bob-021206 <binyan_github@163.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants