Skip to content

[Perf] Bagel KV-ready early forwarding and time step consistency for /v1/chat/completions#2398

Merged
princepride merged 7 commits intovllm-project:mainfrom
natureofnature:bagel/kv_transfer_opt
Apr 1, 2026
Merged

[Perf] Bagel KV-ready early forwarding and time step consistency for /v1/chat/completions#2398
princepride merged 7 commits intovllm-project:mainfrom
natureofnature:bagel/kv_transfer_opt

Conversation

@natureofnature
Copy link
Copy Markdown
Contributor

@natureofnature natureofnature commented Apr 1, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix timestep mismatching for bagel ar/dit mode:

The /v1/chat/completions endpoint for disaggregated pipeline image generation only forwarded height and width from the request's extra_body to the diffusion stage sampling params, but ignored num_inference_steps. This caused the DiT stage to always fall back to the hardcoded default of 50 timesteps regardless of the client-specified value.

Forward to next stage on KV-ready instead of decode-finished

In the disaggregated pipeline, the orchestrator previously waited for AR Stage-0 to fully finish decoding (up to max_tokens tokens) before forwarding the request to the DiT stage. However, the DiT stage only needs the prefill KV cache for conditioning and does not depend on decode outputs. This change makes the AR scheduler emit a kv_ready signal as soon as KV cache extraction completes, and the orchestrator immediately forwards the request to the DiT stage upon receiving this signal, eliminating the unnecessary wait for AR decode to finish. For Bagel with max_tokens=2048, this reduces disaggregated t2i end-to-end latency from ~22s to ~19.7s (matching single-stage baseline) and disaggregated i2i from ~35.8s to ~27.3s at 50 timesteps.

_mark_request_for_kv_transfer(req_id, snapshot_len)
       ↓
model_runner : extract kv cache
       ↓
model_runner_output.kv_extracted_req_ids includes req_id
       ↓
scheduler: emit kv_ready signal
       ↓
orchestrator._handle_kv_ready_raw_outputs: receives kv signal
       ↓
orchestrator._forward_to_next_stage forward to DiT

Test Plan

  1. Bagel image to image and text to image
  2. Non disaggregated mode and AR/DIT disaggregated mode

Test Result

Using default max token settins, on H800 GPU,

text to image

t2i Promt:
"A cute cat wearing sunglasses"
size: 1024x1024

image to image

input image:
image

i2i Prompt:
"Transform this photo into a soft watercolor illustration while preserving the original composition, natural lighting, fur details, and face. Keep balanced exposure and realistic contrast."
size: 1024x1024

50 Time Steps

baseline current
t2i iter_0 iter_0
i2i iter_0 iter_0

Before

Mode Task Total (mean) AR/Residual DiT Note
single-stage baseline t2i 19.7710s 0.0000s (0.0%) 19.7710s (100.0%) single-stage baseline; no stage transfer
single-stage baseline i2i 22.4768s 0.0000s (0.0%) 22.4768s (100.0%) single-stage baseline; no stage transfer
split + shared memory t2i 22.0154s 2.7262s (12.4%) 19.2892s (87.6%) shared_memory
split + shared memory i2i 35.8336s 12.8266s (35.8%) 23.0070s (64.2%) shared_memory; cfg companion path

After

Mode Task Total (mean) AR/Residual DiT Note
single-stage baseline t2i 19.7809s 0.0000s (0.0%) 19.7809s (100.0%) single-stage baseline; no stage transfer
single-stage baseline i2i 22.4857s 0.0000s (0.0%) 22.4857s (100.0%) single-stage baseline; no stage transfer
split + shared memory t2i 19.7001s 0.3751s (1.9%) 19.3251s (98.1%) shared_memory
split + shared memory i2i 27.3342s 4.3009s (15.7%) 23.0333s (84.3%) shared_memory; cfg companion path

10 Time Steps

t2i i2i
iter_0 iter_0

Before

Mode Task Total (mean) AR/Residual DiT Note
single-stage baseline t2i 4.1809s 0.0000s (0.0%) 4.1809s (100.0%) single-stage baseline; no stage transfer
single-stage baseline i2i 5.1568s 0.0000s (0.0%) 5.1568s (100.0%) single-stage baseline; no stage transfer
split + shared memory t2i 19.7721s 0.4466s (2.3%) 19.3255s (97.7%) shared_memory
split + shared memory i2i 37.8469s 12.6156s (33.3%) 22.2961s (58.9%) shared_memory; cfg companion path

After

Mode Task Total (mean) AR/Residual DiT Note
single-stage baseline t2i 4.1235s 0.0000s (0.0%) 4.1235s (100.0%) single-stage baseline; no stage transfer
single-stage baseline i2i 5.1478s 0.0000s (0.0%) 5.1478s (100.0%) single-stage baseline; no stage transfer
split + shared memory t2i 4.1242s 0.3748s (9.1%) 3.7494s (90.9%) shared_memory
split + shared memory i2i 9.9261s 4.1899s (42.2%) 5.7362s (57.8%) shared_memory; cfg companion path

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

@princepride @hsliuustc0106

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…tage on KV-ready instead of decode-finished

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e831bcd5f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm_omni/engine/orchestrator.py
Comment thread vllm_omni/engine/orchestrator.py Outdated
@natureofnature natureofnature changed the title [Perf] Bagel KV-ready early forwarding and time step consistency [Perf] Bagel KV-ready early forwarding and time step consistency for /v1/chat/completions Apr 1, 2026
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@princepride
Copy link
Copy Markdown
Collaborator

Can you also update max_tokens related code in vllm-omni/tests/e2e/offline_inference/test_bagel_text2img.py and vllm-omni/tests/e2e/offline_inference/test_bagel_img2img.py?

@princepride
Copy link
Copy Markdown
Collaborator

This file also need update: vllm-omni/examples/offline_inference/bagel/end2end.py

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@natureofnature
Copy link
Copy Markdown
Contributor Author

This file also need update: vllm-omni/examples/offline_inference/bagel/end2end.py

They have been removed. @princepride

Copy link
Copy Markdown
Collaborator

@princepride princepride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
@princepride princepride added the ready label to trigger buildkite CI label Apr 1, 2026
@princepride princepride enabled auto-merge (squash) April 1, 2026 08:48
@princepride princepride merged commit d40840b into vllm-project:main Apr 1, 2026
7 of 8 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…/v1/chat/completions (vllm-project#2398)

Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants