[BugFix]: Fix async scheduer transfer exceed KV cache by princepride · Pull Request #3318 · vllm-project/vllm-omni

princepride · 2026-05-03T06:56:23Z

Purpose

AsyncScheduler._update_after_schedule() optimistically advances request.num_computed_tokens with output placeholders before GPU execution confirms the tokens. The KV transfer logic was using this inflated num_computed_tokens as seq_len, causing the DiT stage to read uninitialized KV data for placeholder positions — resulting in wrong conditioning and different generated pixels.

Fix: Override _update_request_with_output in OmniARScheduler to explicitly manage the async placeholder protocol, and use num_computed_tokens - num_output_placeholders (confirmed computed tokens) for all KV transfer seq_len values.

Test Plan

pytest -v tests/e2e/offline_inference/test_bagel_text2img.py -m "advanced_model" --run-level "advanced_model"

Result

=========================================================================== test session starts ===========================================================================
platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3.12
cachedir: .pytest_cache
rootdir: /proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.4.0, hydra-core-1.3.2, asyncio-1.3.0, rerunfailures-16.1, mock-3.15.1, shard-0.1.2, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                                                         
Running 2 items in this shard: tests/e2e/offline_inference/test_bagel_text2img.py::test_bagel_text2img_shared_memory_connector, tests/e2e/offline_inference/test_bagel_text2img.py::test_bagel_text2img_mooncake_connector

tests/e2e/offline_inference/test_bagel_text2img.py::test_bagel_text2img_shared_memory_connector PASSED                                                              [ 50%]
tests/e2e/offline_inference/test_bagel_text2img.py::test_bagel_text2img_mooncake_connector PASSED                                                                   [100%]

============================================================================ warnings summary =============================================================================
vllm_omni/version.py:55
  /proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni/vllm_omni/version.py:55: RuntimeWarning: vLLM and vLLM-Omni appear to have mismatched major/minor versions:
   --> vLLM-Omni version 0.18.1.dev411+gd49356e4c.d20260503
   --> vLLM version 0.20.0
  This will likely cause compatibility issues.
    warn_if_misaligned_vllm_version()

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:365: 14 warnings
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--- Running Summary
=============================================================== 2 passed, 17 warnings in 180.57s (0:03:00) ================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Signed-off-by: princepride <wangzhipeng628@gmail.com>

chatgpt-codex-connector · 2026-05-03T06:56:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

princepride · 2026-05-03T06:59:38Z

Related: #3306 @alex-jw-brooks PTAL, I think this fix can resolve the bug

hsliuustc0106 · 2026-05-03T08:05:33Z

@@ -622,7 +668,8 @@ def _free_request(self, request: Request, delay_free_blocks: bool = False) -> di
                    )
            else:


Related: _replace_session_with_streaming_update in omni_scheduler_mixin.py resets num_computed_tokens = 0 but does not reset num_output_placeholders. If this helper is ever called on a request that went through that path, it would return a negative value. Probably worth resetting num_output_placeholders = 0 there too for consistency.

hsliuustc0106

BLOCKER scan:

Category	Result
Correctness	PASS
Reliability/Safety	PASS
Breaking Changes	PASS
Test Coverage	PASS
Documentation	PASS
Security	PASS

Non-blocking suggestions:

Missing regression test: This is a bug fix for AsyncScheduler's KV transfer issue. Consider adding a regression test that specifically validates the fix - e.g., a test that would fail with the buggy behavior (reading uninitialized KV) but passes with the fix. The existing Bagel text2img tests validate the output is correct, but don't explicitly test the KV transfer correctness edge case.
Behavior change in _update_request_with_output: You've switched from AsyncScheduler's implementation to SyncScheduler's implementation for token appending. This is intentional (to avoid the eager cache_blocks call), but it's worth documenting this deviation from the parent class behavior for future maintainers.
Consider adding a helper method: The pattern confirmed_computed = self._get_confirmed_num_computed_tokens(request) is repeated 3 times. Could extract to a local helper for cleaner code, though this is minor.

The fix looks correct and well-documented.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-05-03T10:43:03Z

@natureofnature now img2img task will get a full of noise picture:

root@job-afbdf05bd2b0de31-7qvjv:/proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni# python3 examples/offline_inference/bagel/end2end.py \
        --prompts "Change the grass color to red" \
        --modality img2img --step 15 \
        --image-path 2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg

princepride · 2026-05-03T10:44:04Z

@natureofnature now img2img task will get a full of noise picture:

root@job-afbdf05bd2b0de31-7qvjv:/proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni# python3 examples/offline_inference/bagel/end2end.py \
        --prompts "Change the grass color to red" \
        --modality img2img --step 15 \
        --image-path 2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg

Resolved!

hsliuustc0106

lgtm, but I think we need a more concise design doc for multi-stage model serving

hsliuustc0106 · 2026-05-03T12:02:17Z

cc @tzhouam @alex-jw-brooks @amy-why-3459

alex-jw-brooks · 2026-05-03T18:10:58Z

Thanks @princepride! LGTM, this is pretty much similar to what I had been thinking as well. I'll rebase the other PR on main later today.

@hsliuustc0106 I agree, I think we need that and guidance on how to tune your own config for your device if you aren't using the same setup as the deploy configs (ideally with some utils to help do that automatically). Can help on some of these after 0.20.0 release if others don't pick them up first

) Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

) Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

Fix async transfer KV when compute not finished

b305444

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from hsliuustc0106 as a code owner May 3, 2026 06:56

princepride added the merge-test label to trigger buildkite merge test CI label May 3, 2026

hsliuustc0106 reviewed May 3, 2026

View reviewed changes

fix img2img bug

2d778d7

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 added the ready label to trigger buildkite CI label May 3, 2026

Merge branch 'main' into fix-scheduler-bug

3ada257

hsliuustc0106 approved these changes May 3, 2026

View reviewed changes

hsliuustc0106 merged commit e50a066 into vllm-project:main May 3, 2026
8 checks passed

alex-jw-brooks mentioned this pull request May 3, 2026

[Core] Support Async & Sync AutoRegressive Scheduling #3306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix]: Fix async scheduer transfer exceed KV cache#3318

[BugFix]: Fix async scheduer transfer exceed KV cache#3318
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
princepride:fix-scheduler-bug

princepride commented May 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

princepride commented May 3, 2026

Uh oh!

hsliuustc0106 May 3, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

princepride commented May 3, 2026

Uh oh!

princepride commented May 3, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 commented May 3, 2026

Uh oh!

Uh oh!

alex-jw-brooks commented May 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -622,7 +668,8 @@ def _free_request(self, request: Request, delay_free_blocks: bool = False) -> di
		)
		else:

Conversation

princepride commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Result

Uh oh!

chatgpt-codex-connector Bot commented May 3, 2026

Uh oh!

princepride commented May 3, 2026

Uh oh!

hsliuustc0106 May 3, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

princepride commented May 3, 2026

Uh oh!

princepride commented May 3, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented May 3, 2026

Uh oh!

Uh oh!

alex-jw-brooks commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

princepride commented May 3, 2026 •

edited

Loading

alex-jw-brooks commented May 3, 2026 •

edited

Loading