fix --gpu-memory-utilization CLI override by tarikcurto · Pull Request #2516 · vllm-project/vllm-omni

tarikcurto · 2026-04-06T08:02:46Z

Purpose

Fix --gpu-memory-utilization (and other standard EngineArgs flags) being silently ignored when serving multi-stage Omni models with --omni.

Related issue: --gpu-memory-utilization CLI flag ignored in --omni mode (Voxtral-4B-TTS-2603 uses ~80% GPU despite --gpu-memory-utilization .2)

Root cause: load_stage_configs_from_yaml merges CLI kwargs with YAML per-stage engine_args via OmegaConf.merge(cli_args, yaml_args). Since OmegaConf.merge is left-to-right (later wins), YAML values always override CLI values.
voxtral_tts.yaml hardcodes gpu_memory_utilization: 0.8 (stage 0) and 0.1 (stage 1), discarding any user-specified value.

Fix: Thread a user_overrides dict through the config loading chain. It is computed in AsyncOmniEngine._resolve_stage_configs by comparing kwargs against OmniEngineArgs defaults — only values that differ from the dataclass
default are treated as user-specified. After the YAML merge, user_overrides are re-applied as the highest-priority layer, ensuring:

Explicit CLI flags (e.g. --gpu-memory-utilization, --max-model-len) are always respected.
YAML model-specific settings that the user did not override (e.g. enforce_eager, scheduler_cls, worker_type) are unaffected.

Changed files:

vllm_omni/engine/async_omni_engine.py — compute user_overrides from kwargs vs. OmniEngineArgs defaults
vllm_omni/entrypoints/utils.py — propagate user_overrides through load_and_resolve_stage_configs → load_stage_configs_from_model → load_stage_configs_from_yaml and re-apply post-merge

Test Plan

Manually verified with mistralai/Voxtral-4B-TTS-2603 on an NVIDIA RTX 6000 (96 GiB):

vllm serve mistralai/Voxtral-4B-TTS-2603 --omni \
  --gpu-memory-utilization .2 \
  --max-model-len 1000 \
  --port 8001

Checked nvidia-smi and the Available KV cache memory log line for each stage to confirm the flag is respected.

No new test scripts are added as this is a config-loading fix in the legacy OmegaConf path (marked deprecated, slated for removal in PR series [2/N]). Existing unit tests in tests/ continue to pass.

Test Result

Before fix:
(EngineCore pid=3097) INFO [base.py:129] Available KV cache memory: 67.2 GiB (process-scoped)
nvidia-smi: 79831MiB / 97887MiB (~82% utilization)

After fix:
(EngineCore pid=XXXX) INFO [base.py:129] Available KV cache memory: ~11.7 GiB (process-scoped)
nvidia-smi: 21471MiB / 97887MiB (~20% utilization per stage)

Stage 0 and stage 1 each receive gpu_memory_utilization=0.2 as specified, overriding the YAML defaults of 0.8 and 0.1.

Signed-off-by: Tarik Curto <centro.tarik@live.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6db697568f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T08:07:51Z

+                if key in kwargs and kwargs[key] != getattr(_default, key, None):
+                    user_overrides[key] = kwargs[key]


Detect explicit CLI overrides independently of value equality

This override detection only includes keys whose value differs from OmniEngineArgs defaults, so explicitly provided flags that happen to equal the default are treated as "not user-set" and never reapplied after YAML merge. In multi-stage YAMLs that set a different per-stage value, an explicit CLI flag like --gpu-memory-utilization set to the default still gets silently overridden by YAML, which contradicts the intended "explicit CLI wins" behavior.

Useful? React with 👍 / 👎.

lishunyang12 · 2026-04-06T09:32:44Z

Hi, @tarikcurto. Thanks for your attention in vllm-omni. This is a long-standing issue. We are working on a large-scale config refractoring to tackle this. Can you take a look at our RFC #2072 and a preliminary PR #2383. Any feedback is appreciated.

tarikcurto · 2026-04-06T12:56:27Z

Hello @lishunyang12, thank you for your clarification. In that case i proceed to close main pr.

tarikcurto · 2026-04-08T18:46:23Z

I have seen that @lishunyang12 closed (paused) their PR #2383.
In the meantime, i think that main branch should have a fix to have --gpu-memory-utilization argument working.
By that reason, i decided to reopen this small fix PR.

tarikcurto · 2026-04-10T17:29:09Z

Fixed in PR #2663 with a more minimal solution.

I proceed to close this PR.

fix --gpu-memory-utilization CLI override

6db6975

Signed-off-by: Tarik Curto <centro.tarik@live.com>

tarikcurto requested a review from hsliuustc0106 as a code owner April 6, 2026 08:02

chatgpt-codex-connector Bot reviewed Apr 6, 2026

View reviewed changes

tarikcurto closed this Apr 6, 2026

tarikcurto reopened this Apr 8, 2026

tarikcurto closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix --gpu-memory-utilization CLI override#2516

fix --gpu-memory-utilization CLI override#2516
tarikcurto wants to merge 1 commit intovllm-project:mainfrom
tarikcurto:fix/gpu-memory-utilization-arg-override

tarikcurto commented Apr 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

lishunyang12 commented Apr 6, 2026

Uh oh!

tarikcurto commented Apr 6, 2026

Uh oh!

tarikcurto commented Apr 8, 2026

Uh oh!

tarikcurto commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if key in kwargs and kwargs[key] != getattr(_default, key, None):
		user_overrides[key] = kwargs[key]

Conversation

tarikcurto commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Apr 6, 2026

Uh oh!

tarikcurto commented Apr 6, 2026

Uh oh!

tarikcurto commented Apr 8, 2026

Uh oh!

tarikcurto commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tarikcurto commented Apr 6, 2026 •

edited

Loading