[Config] Clean up SchedulerConfig initialization by DarkLight1337 · Pull Request #28665 · vllm-project/vllm

DarkLight1337 · 2025-11-13T16:05:50Z

Purpose

Move default values of SchedulerConfig from utils to SchedulerConfig itself.
In normal usage of vLLM, max_num_batched_tokens and max_num_seqs are actually set by UsageContext when constructing EngineArgs, so there is no point in making the default values of SchedulerConfig dynamic (the defaults for pooling and multimodal models are actually being overwritten). This simplifies the initialization code a lot.
Split up the logic of EngineArgs._set_default_args to be more modular.
Make SchedulerConfig.chunked_prefill_enabled a property-based alias of SchedulerConfig.enable_chunked_prefill to avoid having to set both when overriding the config in each platform.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request provides a solid cleanup of the SchedulerConfig initialization. Moving the default value logic from SchedulerConfig.__post_init__ to EngineArgs and refactoring it into new helper methods significantly improves modularity and readability. The use of class variables for default values in SchedulerConfig is also a good practice. I've identified one minor logging issue that should be addressed.

vllm/engine/arg_utils.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-11-13T16:09:12Z

/gemini review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/engine/arg_utils.py

vllm/config/scheduler.py

gemini-code-assist

Code Review

This pull request is a good refactoring that cleans up SchedulerConfig initialization by centralizing default values and improving the modularity of EngineArgs. The use of a property for chunked_prefill_enabled is a clean way to handle aliasing and maintain backward compatibility.

However, I've identified a critical issue: the refactoring appears to have removed the specific default max_num_batched_tokens logic for pooling and multimodal models. This could lead to a performance regression for these model types. I've added a detailed comment regarding this. Once this is addressed, the PR will be a strong improvement.

gemini-code-assist · 2025-11-13T16:11:04Z

vllm/config/scheduler.py

    def __post_init__(self, is_encoder_decoder: bool) -> None:
-        if self.max_model_len is None:
-            self.max_model_len = 8192
-
-        if self.max_num_seqs is None:
-            self.max_num_seqs = 128
-
        if is_encoder_decoder:
            # Chunked prefill should be disabled for encoder-decoder models.
            self.disable_chunked_mm_input = True
-            self.chunked_prefill_enabled = False
            self.enable_chunked_prefill = False
            self.long_prefill_token_threshold = 0
            logger.info(
                "Encoder-decoder models do not support chunked prefill nor"
                " prefix caching; disabling both."
            )

-        if self.max_num_batched_tokens is None:
-            if self.enable_chunked_prefill:
-                self.max_num_batched_tokens = DEFAULT_MAX_NUM_BATCHED_TOKENS
-            else:
-                # If max_model_len is too short, use
-                # DEFAULT_MAX_NUM_BATCHED_TOKENS as the default value
-                # for higher throughput.
-                self.max_num_batched_tokens = max(
-                    self.max_model_len, DEFAULT_MAX_NUM_BATCHED_TOKENS
-                )
-
-            if self.runner_type == "pooling":
-                # Choose specific value for higher throughput
-                self.max_num_batched_tokens = max(
-                    self.max_num_batched_tokens,
-                    POOLING_MODEL_MAX_NUM_BATCHED_TOKENS,
-                )
-            if self.is_multimodal_model:
-                # The value needs to be at least the number of multimodal tokens
-                self.max_num_batched_tokens = max(
-                    self.max_num_batched_tokens,
-                    MULTIMODAL_MODEL_MAX_NUM_BATCHED_TOKENS,
-                )
-
-            # When using default settings,
-            # Ensure max_num_batched_tokens does not exceed model limit.
-            # Some models (e.g., Whisper) have embeddings tied to max length.
-            self.max_num_batched_tokens = min(
-                self.max_num_seqs * self.max_model_len, self.max_num_batched_tokens
+        if not self.enable_chunked_prefill:
+            # If max_model_len is too short, use the default for higher throughput.
+            self.max_num_batched_tokens = max(
+                self.max_model_len,
+                self.max_num_batched_tokens,
            )

+        # Ensure max_num_batched_tokens does not exceed model limit.
+        # Some models (e.g., Whisper) have embeddings tied to max length.
+        self.max_num_batched_tokens = min(
+            self.max_num_seqs * self.max_model_len,
+            self.max_num_batched_tokens,
+        )
+
        self.max_num_encoder_input_tokens = self.max_num_batched_tokens
        self.encoder_cache_size = self.max_num_batched_tokens



This refactoring simplifies the initialization, but it seems to have removed the special default logic for max_num_batched_tokens for pooling and multimodal models.

Previously, if max_num_batched_tokens was not set by the user or a UsageContext-specific default, there was fallback logic to increase it for pooling models (to 32768) and multimodal models (to 5120) for better throughput. This logic was triggered if max_num_batched_tokens was None when __post_init__ was called.

This logic has now been removed. The justification in the PR description suggests this was dead code, but it appears it would have been triggered if no UsageContext default was found. The new implementation in EngineArgs ensures max_num_batched_tokens is always set, but the specific, higher defaults for pooling/multimodal models are no longer applied anywhere.

Removing this could lead to a significant performance regression for these model types. Could you please confirm if this change is intended? If it's a mistake, this logic should be restored, perhaps within EngineArgs.get_batch_defaults.

The case of no UsageContext is not normal usage of vLLM

@njhill @WoosukKwon @robertgshaw2-redhat correct me if I'm wrong about this

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

WoosukKwon · 2025-11-17T07:07:33Z

@DarkLight1337 I think this PR changes the default values for max_num_batched_tokens and max_num_seqs unexpectedly.

WoosukKwon · 2025-11-17T07:11:35Z

IIUC, the two if statements here are not executed because self.max_num_batched_tokens and self.max_num_seqs are already set to the default values in SchedulerConfig:

vllm/vllm/engine/arg_utils.py

Lines 1988 to 1998 in 3380ed5

    
           if self.max_num_batched_tokens is None: 
        
               self.max_num_batched_tokens = default_max_num_batched_tokens.get( 
        
                   usage_context, 
        
                   SchedulerConfig.DEFAULT_MAX_NUM_BATCHED_TOKENS, 
        
               ) 
        
           if self.max_num_seqs is None: 
        
               self.max_num_seqs = default_max_num_seqs.get( 
        
                   usage_context, 
        
                   SchedulerConfig.DEFAULT_MAX_NUM_SEQS, 
        
               )

hmellor · 2025-11-17T10:48:25Z

If the behaviour has changed I don't think it's because of the changed defaults in SchedulerConfig. The if blocks linked will be triggered if the values of EngineArgs are None, which still are in the default case:

vllm/vllm/engine/arg_utils.py

Line 436 in d4acf51

max_num_batched_tokens: int | None = None

vllm/vllm/engine/arg_utils.py

Line 440 in d4acf51

max_num_seqs: int | None = None

DarkLight1337 · 2025-11-17T13:19:57Z

Yeah, tests/v1/engine/test_engine_args.py::test_defaults_with_usage_context ensures that the defaults stay the same.

WoosukKwon · 2025-11-17T17:12:07Z

@hmellor @DarkLight1337 In B200, the correct default values (1024 seqs, 8K tokens) are not used. vllm serve uses 128 seqs & 2K tokens instead, which limits the performance a lot.

DarkLight1337 · 2025-11-17T17:16:12Z

Can you run tests/v1/engine/test_engine_args.py::test_defaults_with_usage_context on the B200 and see if the test passes?

WoosukKwon · 2025-11-17T17:23:13Z

@DarkLight1337 It passes the test, but it's probably because the usage context is considered in the test (while it doesn't in vllm serve). And I've confirmed that this PR caused the issue.

DarkLight1337 · 2025-11-17T17:24:37Z

Hmm, shouldn't UsageContext.OPENAI_API_SERVER be used in vllm serve? Or did I misunderstand how UsageContext works?

WoosukKwon · 2025-11-17T17:26:22Z

@DarkLight1337 vllm serve supposed to use usage_context too. However, this PR introduces a bug that it doesn't use the usage context.

 if self.max_num_batched_tokens is None: 
     self.max_num_batched_tokens = default_max_num_batched_tokens.get( 
         usage_context, 
         SchedulerConfig.DEFAULT_MAX_NUM_BATCHED_TOKENS, 
     )

Here, self.max_num_batched_tokens is never None after this PR for some reason. It is somehow set to 2048 (default value in SchedulerConfig).

DarkLight1337 · 2025-11-17T17:35:54Z

Ok I figured out the issue, the CLI defaults are still using the ones from SchedulerConfig.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

[Config] Clean up SchedulerConfig initialization

08c2978

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from Isotr0py, jikunshang, noooop and ywang96 November 13, 2025 16:05

DarkLight1337 requested a review from NickLucche as a code owner November 13, 2025 16:05

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2025

DarkLight1337 requested review from ProExpertProg, WoosukKwon, bigPYJ1151, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners November 13, 2025 16:05

mergify bot added the tpu Related to Google TPUs label Nov 13, 2025

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

Fix

ab2a1d7

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

chatgpt-codex-connector bot reviewed Nov 13, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

vllm/config/scheduler.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

Isotr0py approved these changes Nov 13, 2025

View reviewed changes

DarkLight1337 added 5 commits November 14, 2025 06:29

Fix wrong defaults

1587a60

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Merge branch 'main' into clean-sched-defaults

380b616

Remove unused typevar

4a46550

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix failing tests

8af29d6

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Update logs

bad9872

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ZJY0516 mentioned this pull request Nov 15, 2025

[Bugfix] fix kimi-linear crash #28445

Merged

5 tasks

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

[Config] Clean up SchedulerConfig initialization (vllm-project#28665)

5837ac8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

22quinn mentioned this pull request Nov 17, 2025

Do not allow disabling chunked prefill for generation models #28833

Closed

5 tasks

DarkLight1337 mentioned this pull request Nov 17, 2025

[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields #28872

Merged

5 tasks

noooop mentioned this pull request Nov 20, 2025

Improve enable chunked_prefill & prefix_caching logic. #26623

Merged

5 tasks

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Config] Clean up SchedulerConfig initialization (vllm-project#28665)

8c5a6fd

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Potabk mentioned this pull request Dec 1, 2025

[Main] Upgrade vllm commit to 2025_12_01 vllm-project/vllm-ascend#4527

Closed

wangxiyuan mentioned this pull request Dec 1, 2025

upgrade vLLM to main vllm-project/vllm-ascend#4608

Merged

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

[Config] Clean up SchedulerConfig initialization (vllm-project#28665)

4c6a054

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Uh oh!

Conversation

DarkLight1337 commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Nov 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

hmellor commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DarkLight1337 commented Nov 13, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Nov 13, 2025 •

edited

Loading

WoosukKwon commented Nov 17, 2025 •

edited

Loading