Skip to content

[Entrypoint][Refactor]Stage CLI Refactor#2020

Merged
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
wuhang2014:cli
Apr 10, 2026
Merged

[Entrypoint][Refactor]Stage CLI Refactor#2020
hsliuustc0106 merged 9 commits intovllm-project:mainfrom
wuhang2014:cli

Conversation

@wuhang2014
Copy link
Copy Markdown
Contributor

@wuhang2014 wuhang2014 commented Mar 19, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Since a refactor of entrypoint( #1908 ), stage based CLI( #939 ) is needed to refactor too because it is mainly implemented in entrypoint module.

Test Plan

# stage 0 CLI
CUDA_VISIBLE_DEVICES=6 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 0 \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567 \
  --port 9898

# stage 1 CLI
CUDA_VISIBLE_DEVICES=5 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 1 \
  --headless \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567

# stage 2 CLI
CUDA_VISIBLE_DEVICES=5 \
vllm serve /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ \
  --omni \
  --stage-id 2 \
  --headless \
  --omni-master-address 127.0.0.1 \
  --omni-master-port 33567

Test Result

(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni# cd examples/online_serving/
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving# python openai_chat_completion_client_for_multimodal_generation.py --model /nvme5n1p1/wuhang/Qwen3-Omni-30B-A3B-Instruct/ --query-type use_image --image-path /nvme5n1p1/wuhang/dog-4988985_960_720.jpg --port 9898
Chat completion output from text: This is a beautiful Corgi dog lying on a grassy field.
Audio saved to audio_chatcmpl-bcf8c5158f0ecf92_0.wav
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving# ls -l
total 244
-rw-r--r-- 1 root root 167894 Mar 24 09:13 audio_chatcmpl-bcf8c5158f0ecf92_0.wav
drwxr-xr-x 2 root root   4096 Mar 11 08:34 bagel
drwxr-xr-x 4 root root   4096 Mar 11 08:34 chart-helm
drwxr-xr-x 2 root root   4096 Mar 17 13:12 fish_speech
drwxr-xr-x 2 root root   4096 Mar 11 08:34 glm_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 helios
drwxr-xr-x 2 root root   4096 Mar 11 08:34 image_to_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 image_to_video
drwxr-xr-x 2 root root   4096 Mar 11 08:34 mimo_audio
-rw-r--r-- 1 root root  21122 Mar 24 07:12 openai_chat_completion_client_for_multimodal_generation.py
drwxr-xr-x 2 root root   4096 Mar 24 07:12 qwen2_5_omni
drwxr-xr-x 2 root root   4096 Mar 24 07:35 qwen3_omni
drwxr-xr-x 2 root root   4096 Mar 24 07:12 qwen3_tts
drwxr-xr-x 2 root root   4096 Mar 24 07:12 text_to_image
drwxr-xr-x 2 root root   4096 Mar 24 07:12 text_to_video
drwxr-xr-x 2 root root   4096 Mar 24 07:12 voxtral_tts
(wuhang) root@10-90-67-82:/nvme5n1p1/wuhang/vllm-omni/examples/online_serving# 


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@wuhang2014 wuhang2014 force-pushed the cli branch 11 times, most recently from 2da5ae7 to 916b1b0 Compare March 20, 2026 14:16
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns on this refactor:

  1. stage_id: intint | None (arg_utils.py) — downstream code doing arithmetic on stage_id without a None check will break. _effective_stage_id only covers create_model_config; other callers are unguarded.

  2. init_timeout default silently dropped from 600 → 300 — existing deployments relying on 600s will start timing out.

  3. Unsafe manual __enter__/__exit__ in _create_remote_llm_stage — if anything fails between them, the socket leaks. Use a with statement or contextlib.ExitStack.

@wuhang2014 wuhang2014 force-pushed the cli branch 17 times, most recently from c1767ee to 4e39584 Compare March 24, 2026 10:49
@wuhang2014 wuhang2014 marked this pull request as ready for review March 24, 2026 10:52
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@princepride please help validate bagel for other input/output pairs

@hsliuustc0106 hsliuustc0106 added the nightly-test label to trigger buildkite nightly test CI label Apr 8, 2026
@@ -0,0 +1,589 @@
"""Helpers for launching and handshaking omni engine cores."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omni_core_engine.py name is misleading

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, renamed it to stage_engine_startup.py

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
close_started_llm_stage(started_stage)
raise

def _launch_registered_diffusion_stage(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call it _launch_diffusion_stage

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the input args looks so different compared with:

def _launch_llm_stage(
        self,
        stage_cfg: Any,
        metadata: Any,
        stage_connector_spec: dict[str, Any],
        stage_init_timeout: int,
        llm_stage_launch_lock: threading.Lock,
        omni_kv_connector: tuple[dict[str, Any] | None, str | None, str | None] = (None, None, None),
    ) -> StartedLlmStage:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call it _launch_diffusion_stage

Yeah, just rename it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the input args looks so different compared with:

def _launch_llm_stage(
        self,
        stage_cfg: Any,
        metadata: Any,
        stage_connector_spec: dict[str, Any],
        stage_init_timeout: int,
        llm_stage_launch_lock: threading.Lock,
        omni_kv_connector: tuple[dict[str, Any] | None, str | None, str | None] = (None, None, None),
    ) -> StartedLlmStage:

Because they are initially writed in different ways. I think it could be redesigned to unify the code logic.

@wuhang2014 wuhang2014 force-pushed the cli branch 4 times, most recently from 8e24dea to 169e121 Compare April 8, 2026 14:13
# Register the stage once and reuse the returned per-stage handshake
# address for all local engine-core processes.
handshake_address = register_stage_with_omni_master(
omni_master_address=omni_master_server._address,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omni_master_server._address / ._port are private. This pattern repeats in _launch_diffusion_stage and launch_omni_core_engines — can you expose address and port as public properties on OmniMasterServer?

Copy link
Copy Markdown
Contributor Author

@wuhang2014 wuhang2014 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — I exposed address and port as public properties on OmniMasterServer and updated _launch_diffusion_stage and launch_omni_core_engines to use them instead of the private fields.

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
return started_stage
except Exception:
except BaseException as exc:
launch_stack.__exit__(type(exc), exc, exc.__traceback__)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual __exit__ call is fragile — if the stack has already been .close()'d on the happy path and then something throws between close and return, this double-calls exit. Wrapping launch_stack in a with statement (or at least guarding with a flag) would be safer.

Copy link
Copy Markdown
Contributor Author

@wuhang2014 wuhang2014 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — I updated _launch_llm_stage to manage launch_stack with with ExitStack() so cleanup is handled safely without a manual __exit__ call.

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
assert started_stage is not None
return started_stage
except Exception:
except BaseException as exc:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why BaseException instead of Exception? KeyboardInterrupt / SystemExit propagation through ExitStack.__exit__ can mask the original signal.

Copy link
Copy Markdown
Contributor Author

@wuhang2014 wuhang2014 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — now that the manual ExitStack.__exit__ call is gone, I’ve narrowed this to Exception so KeyboardInterrupt and SystemExit still propagate cleanly.

wuhang2014 and others added 8 commits April 9, 2026 19:00
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
@hsliuustc0106 hsliuustc0106 merged commit cb91cbe into vllm-project:main Apr 10, 2026
6 of 8 checks passed
@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 10, 2026

Do we have a usage doc for the new CLI way?

@wuhang2014
Copy link
Copy Markdown
Contributor Author

Do we have a usage doc for the new CLI way?

I'm working on this in #1462.

Sy0307 pushed a commit to Sy0307/vllm-omni that referenced this pull request Apr 10, 2026
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: "AttributeError: 'FastAPI' object has no attribute 'state'" after instance is shut down

9 participants