[Entrypoint][Refactor]Stage CLI Refactor by wuhang2014 · Pull Request #2020 · vllm-project/vllm-omni

wuhang2014 · 2026-03-19T15:14:10Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Since a refactor of entrypoint( #1908 ), stage based CLI( #939 ) is needed to refactor too because it is mainly implemented in entrypoint module.

Test Plan

# stage 0 CLI
CUDA_VISIBLE_DEVICES=6 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 0 \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567 \
  --port 9898

# stage 1 CLI
CUDA_VISIBLE_DEVICES=5 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 1 \
  --headless \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567

# stage 2 CLI
CUDA_VISIBLE_DEVICES=5 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 2 \
  --headless \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567

Test Result

(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni# cd examples/online_serving/
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving# python openai_chat_completion_client_for_multimodal_generation.py --model /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ --query-type use_image --image-path /nvme5n1p1/wuhang/dog-4988985_960_720.jpg --port 9898
Chat completion output from text: This is a beautiful Corgi dog lying on a grassy field.
Audio saved to audio_chatcmpl-bcf8c5158f0ecf92_0.wav
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving# ls -l
total 244
-rw-r--r-- 1 root root 167894 Mar 24 09:13 audio_chatcmpl-bcf8c5158f0ecf92_0.wav
drwxr-xr-x 2 root root   4096 Mar 11 08:34 bagel
drwxr-xr-x 4 root root   4096 Mar 11 08:34 chart-helm
drwxr-xr-x 2 root root   4096 Mar 17 13:12 fish_speech
drwxr-xr-x 2 root root   4096 Mar 11 08:34 glm_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 helios
drwxr-xr-x 2 root root   4096 Mar 11 08:34 image_to_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 image_to_video
drwxr-xr-x 2 root root   4096 Mar 11 08:34 mimo_audio
-rw-r--r-- 1 root root  21122 Mar 24 07:12 openai_chat_completion_client_for_multimodal_generation.py
drwxr-xr-x 2 root root   4096 Mar 24 07:12 qwen2_5_omni
drwxr-xr-x 2 root root   4096 Mar 24 07:35 qwen3_omni
drwxr-xr-x 2 root root   4096 Mar 24 07:12 qwen3_tts
drwxr-xr-x 2 root root   4096 Mar 24 07:12 text_to_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 text_to_video
drwxr-xr-x 2 root root   4096 Mar 24 07:12 voxtral_tts
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving#

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

lishunyang12

A few concerns on this refactor:

stage_id: int → int | None (arg_utils.py) — downstream code doing arithmetic on stage_id without a None check will break. _effective_stage_id only covers create_model_config; other callers are unguarded.
init_timeout default silently dropped from 600 → 300 — existing deployments relying on 600s will start timing out.
Unsafe manual __enter__/__exit__ in _create_remote_llm_stage — if anything fails between them, the socket leaks. Use a with statement or contextlib.ExitStack.

hsliuustc0106 · 2026-04-08T02:46:21Z

@princepride please help validate bagel for other input/output pairs

hsliuustc0106 · 2026-04-08T03:14:23Z

@@ -0,0 +1,589 @@
+"""Helpers for launching and handshaking omni engine cores."""


omni_core_engine.py name is misleading

Yes, renamed it to stage_engine_startup.py

hsliuustc0106 · 2026-04-08T03:24:52Z

+                close_started_llm_stage(started_stage)
+            raise
+
+    def _launch_registered_diffusion_stage(


why not call it _launch_diffusion_stage

why the input args looks so different compared with:

def _launch_llm_stage( self, stage_cfg: Any, metadata: Any, stage_connector_spec: dict[str, Any], stage_init_timeout: int, llm_stage_launch_lock: threading.Lock, omni_kv_connector: tuple[dict[str, Any] | None, str | None, str | None] = (None, None, None), ) -> StartedLlmStage:

why not call it _launch_diffusion_stage

Yeah, just rename it.

why the input args looks so different compared with:

def _launch_llm_stage( self, stage_cfg: Any, metadata: Any, stage_connector_spec: dict[str, Any], stage_init_timeout: int, llm_stage_launch_lock: threading.Lock, omni_kv_connector: tuple[dict[str, Any] | None, str | None, str | None] = (None, None, None), ) -> StartedLlmStage:

Because they are initially writed in different ways. I think it could be redesigned to unify the code logic.

lishunyang12 · 2026-04-08T18:27:17Z

+    # Register the stage once and reuse the returned per-stage handshake
+    # address for all local engine-core processes.
+    handshake_address = register_stage_with_omni_master(
+        omni_master_address=omni_master_server._address,


omni_master_server._address / ._port are private. This pattern repeats in _launch_diffusion_stage and launch_omni_core_engines — can you expose address and port as public properties on OmniMasterServer?

Good catch — I exposed address and port as public properties on OmniMasterServer and updated _launch_diffusion_stage and launch_omni_core_engines to use them instead of the private fields.

lishunyang12 · 2026-04-08T18:27:17Z

            return started_stage
-        except Exception:
+        except BaseException as exc:
+            launch_stack.__exit__(type(exc), exc, exc.__traceback__)


Manual __exit__ call is fragile — if the stack has already been .close()'d on the happy path and then something throws between close and return, this double-calls exit. Wrapping launch_stack in a with statement (or at least guarding with a flag) would be safer.

Good catch — I updated _launch_llm_stage to manage launch_stack with with ExitStack() so cleanup is handled safely without a manual __exit__ call.

lishunyang12 · 2026-04-08T18:27:17Z

            assert started_stage is not None
            return started_stage
-        except Exception:
+        except BaseException as exc:


Why BaseException instead of Exception? KeyboardInterrupt / SystemExit propagation through ExitStack.__exit__ can mask the original signal.

Good catch — now that the manual ExitStack.__exit__ call is gone, I’ve narrowed this to Exception so KeyboardInterrupt and SystemExit still propagate cleanly.

Signed-off-by: wuhang <wuhang6@huawei.com>

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

Signed-off-by: wuhang <wuhang6@huawei.com>

gcanlin · 2026-04-10T06:59:10Z

Do we have a usage doc for the new CLI way?

wuhang2014 · 2026-04-10T07:05:16Z

Do we have a usage doc for the new CLI way?

I'm working on this in #1462.

Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wuhang2014 force-pushed the cli branch 11 times, most recently from 2da5ae7 to 916b1b0 Compare March 20, 2026 14:16

lishunyang12 reviewed Mar 21, 2026

View reviewed changes

wuhang2014 force-pushed the cli branch 17 times, most recently from c1767ee to 4e39584 Compare March 24, 2026 10:49

wuhang2014 marked this pull request as ready for review March 24, 2026 10:52

hsliuustc0106 added the nightly-test label to trigger buildkite nightly test CI label Apr 8, 2026

hsliuustc0106 reviewed Apr 8, 2026

View reviewed changes

wuhang2014 force-pushed the cli branch 4 times, most recently from 8e24dea to 169e121 Compare April 8, 2026 14:13

lishunyang12 reviewed Apr 8, 2026

View reviewed changes

wuhang2014 force-pushed the cli branch 2 times, most recently from 02f1442 to cb9f559 Compare April 9, 2026 01:10

hsliuustc0106 linked an issue Apr 9, 2026 that may be closed by this pull request

[Bug]: "AttributeError: 'FastAPI' object has no attribute 'state'" after instance is shut down #2550

Closed

1 task

wuhang2014 force-pushed the cli branch from 0eb278d to 648632a Compare April 9, 2026 07:12

wuhang2014 and others added 8 commits April 9, 2026 19:00

stage cli

7915543

Signed-off-by: wuhang <wuhang6@huawei.com>

stage cli

c735f31

Signed-off-by: wuhang <wuhang6@huawei.com>

qwen3_omni_expansion to stage cli server

375a1c8

Signed-off-by: wuhang <wuhang6@huawei.com>

bugfix: app.state maybe none when finalizing

2c0fc9d

Signed-off-by: wuhang <wuhang6@huawei.com>

change benchmark init-timeout to 900 sec

4819956

Signed-off-by: wuhang <wuhang6@huawei.com>

flux2.dev cpu offload (#4)

a039d5c

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

revert args parse changes

30d69bc

Signed-off-by: wuhang <wuhang6@huawei.com>

revert config change

fcd6b99

Signed-off-by: wuhang <wuhang6@huawei.com>

wuhang2014 force-pushed the cli branch from 18a441f to fcd6b99 Compare April 9, 2026 11:08

rebase bugfix

b103597

Signed-off-by: wuhang <wuhang6@huawei.com>

wuhang2014 mentioned this pull request Apr 9, 2026

[CI Failure]: Dynin-Omni L4 startup 404s on snu-aidas/dynin-omni_vllm config.json #2649

Closed

hsliuustc0106 merged commit cb91cbe into vllm-project:main Apr 10, 2026
6 of 8 checks passed

fake0fan mentioned this pull request Apr 13, 2026

[Feature]: Adapting the Orchestrator for Large-Scale Serving JiusiServe/vllm-omni#199

Open

1 task

		@@ -0,0 +1,589 @@
		"""Helpers for launching and handshaking omni engine cores."""

Conversation

wuhang2014 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gcanlin commented Apr 10, 2026

Uh oh!

wuhang2014 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

wuhang2014 commented Mar 19, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading

wuhang2014 Apr 8, 2026 •

edited

Loading

wuhang2014 Apr 8, 2026 •

edited

Loading

wuhang2014 Apr 8, 2026 •

edited

Loading