feat: add --prefill-step-size CLI flag by kol22 · Pull Request #105 · waybarrios/vllm-mlx

kol22 · 2026-02-23T05:11:31Z

Summary

MLLMSchedulerConfig.prefill_step_size defaults to 1024 but isn't exposed as a
CLI argument. The MLLM batch generator enforces
total_prompt_tokens <= prefill_step_size * batch_count, so any single vision
request exceeding 1024 tokens fails with:

ValueError: Total prompt tokens (2289) exceeds safe limit (1024)

Since vision tokens alone typically exceed 1024 (images contribute ~1400+
tokens), this effectively blocks all MLLM inference under the default config.

Fix

Add --prefill-step-size to both serve and bench commands. Default of 0
means "use engine default" — 2048 for LLM, 1024 for MLLM — preserving existing
behavior. When set, the value flows through SchedulerConfig to both the LLM
scheduler and the MLLM scheduler.

Example usage:

vllm-mlx serve model-name --continuous-batching --prefill-step-size 16384

Test

Tested with Qwen3-VL-32B-Instruct-8bit. Before this fix, vision requests fail
with the ValueError above. After passing --prefill-step-size 16384, they
complete successfully.

waybarrios · 2026-02-25T02:42:15Z

Looked through this PR. The core plumbing for serve works correctly, the flag flows through SchedulerConfig into MLLMSchedulerConfig as expected.

A few things I noticed:

The --prefill-step-size flag is also added to bench_parser, but bench_command uses AsyncEngineCore which only runs the LLM Scheduler. The MLLM scheduler path lives in BatchedEngine._start_mllm(), which bench never calls. So the flag gets accepted on vllm-mlx bench but does nothing. Might be worth removing it from bench or adding a note.
The flag name --prefill-step-size is pretty generic, but it only controls the MLLM path (mllm_prefill_step_size). SchedulerConfig already has a separate prefill_step_size field (default 2048) for the LLM BatchGenerator. Someone running a plain LLM model with --prefill-step-size 512 would see the flag accepted with no effect. Something like --mllm-prefill-step-size would make the scope clearer and match the internal field name.
The 0 means use default contract lives only in the CLI layer. The dataclass field is Optional[int] with no validation, so if anything constructs SchedulerConfig(mllm_prefill_step_size=0) directly, that 0 gets forwarded straight to MLLMSchedulerConfig and you'd end up with max_batch_tokens = 0.

None of these are blockers, the fix for the original MLLM prefill guard issue works.

waybarrios · 2026-02-25T02:43:08Z

Would you mind addressing point 1 (removing the flag from bench_parser or wiring it to the MLLM bench path) and point 2 (renaming to --mllm-prefill-step-size) before merging? Those two would avoid confusion for users. Point 3 is minor and can be addressed later if you prefer.

waybarrios · 2026-02-25T02:43:47Z

btw @kol22 great work!

kol22 · 2026-02-25T03:28:55Z

ee69b94

Addressed all three points :

Removed the prefill override flag from bench_parser since bench_command runs AsyncEngineCore (LLM scheduler path)
Renamed the serve flag to --mllm-prefill-step-size to make scope explicit and avoid confusion with the LLM prefill_step_size.
Added validation in SchedulerConfig.__post_init__ so mllm_prefill_step_size must be > 0 when provided (prevents direct non-CLI construction with 0 from propagating).

Appreciate the feedback and quick review!

janhilgard · 2026-04-12T05:41:43Z

@waybarrios — all three review items were addressed back on Feb 25 (removed from bench, renamed to --mllm-prefill-step-size, added > 0 validation). This has been waiting for a follow-up review for ~7 weeks now.

It does have merge conflicts from recent main changes — @kol22 would you mind rebasing? After that it should be ready to merge.

Expose prefill_step_size as a CLI argument for both serve and bench commands. Default of 0 means "use engine default" (2048 for LLM, 1024 for MLLM), preserving existing behavior. Vision models routinely exceed 1024 tokens per prompt (images alone contribute 1400+), hitting the MLLM batch generator's safe limit. This flag lets users raise the limit without patching source code.

kol22 · 2026-04-12T20:11:30Z

@janhilgard - resolved & rebased. Should be good to go.

janhilgard

LGTM — all three review items from @waybarrios have been addressed:

Removed from bench_parser — the flag is now serve-only, matching the actual MLLM scheduler path.
Renamed to --mllm-prefill-step-size — clear scope, matches the internal field name.
__post_init__ validation — mllm_prefill_step_size must be > 0 when provided, preventing the 0 propagation edge case.

Rebase on current main is clean. Code is minimal and consistent with existing patterns. Ready to merge.

kol22 marked this pull request as draft February 24, 2026 23:32

kol22 marked this pull request as ready for review February 25, 2026 01:31

waybarrios assigned waybarrios and janhilgard Feb 25, 2026

waybarrios added the enhancement New feature or request label Feb 25, 2026

kol22 added 3 commits April 12, 2026 15:08

Clarify MLLM prefill step override behavior

2c970c5

refactor: clarify MLLM prefill CLI flag and validate override

186371d

kol22 force-pushed the fix/prefill-step-size-mllm branch from ee69b94 to 186371d Compare April 12, 2026 20:09

janhilgard approved these changes Apr 12, 2026

View reviewed changes

janhilgard merged commit 7cfae14 into waybarrios:main Apr 12, 2026
7 checks passed

kol22 deleted the fix/prefill-step-size-mllm branch April 12, 2026 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --prefill-step-size CLI flag#105

feat: add --prefill-step-size CLI flag#105
janhilgard merged 3 commits intowaybarrios:mainfrom
kol22:fix/prefill-step-size-mllm

kol22 commented Feb 23, 2026

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

kol22 commented Feb 25, 2026

Uh oh!

janhilgard commented Apr 12, 2026

Uh oh!

kol22 commented Apr 12, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kol22 commented Feb 23, 2026

Summary

Fix

Test

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

waybarrios commented Feb 25, 2026

Uh oh!

kol22 commented Feb 25, 2026

Uh oh!

janhilgard commented Apr 12, 2026

Uh oh!

kol22 commented Apr 12, 2026

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants