Skip to content

[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment#33848

Merged
DarkLight1337 merged 2 commits intovllm-project:mainfrom
RunkaiTao:unpermute/fix
Feb 12, 2026
Merged

[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment#33848
DarkLight1337 merged 2 commits intovllm-project:mainfrom
RunkaiTao:unpermute/fix

Conversation

@RunkaiTao
Copy link
Copy Markdown
Contributor

@RunkaiTao RunkaiTao commented Feb 5, 2026

Purpose

Fix a bug that naive_block_assignment always defaulting to False due to arg misalignment.

Test Result

gpt-oss 120b max_loras=8, concurrency=1

before

============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  180.34    
Total input tokens:                      63226     
Total generated tokens:                  24000     
Request throughput (req/s):              0.22      
Output token throughput (tok/s):         133.08    
Peak output token throughput (tok/s):    138.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          483.66    
---------------Time to First Token----------------
Mean TTFT (ms):                          155.22    
Median TTFT (ms):                        139.08    
P99 TTFT (ms):                           521.81    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          7.27      
Median TPOT (ms):                        7.27      
P99 TPOT (ms):                           7.28      
---------------Inter-token Latency----------------
Mean ITL (ms):                           7.27      
Median ITL (ms):                         7.27      
P99 ITL (ms):                            7.65      
==================================================

after

============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  139.22    
Total input tokens:                      63226     
Total generated tokens:                  24000     
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         172.39    
Peak output token throughput (tok/s):    180.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          626.53    
---------------Time to First Token----------------
Mean TTFT (ms):                          132.71    
Median TTFT (ms):                        132.05    
P99 TTFT (ms):                           161.04    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          5.59      
Median TPOT (ms):                        5.59      
P99 TPOT (ms):                           5.59      
---------------Inter-token Latency----------------
Mean ITL (ms):                           5.59      
Median ITL (ms):                         5.60      
P99 ITL (ms):                            6.06      
==================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
@RunkaiTao RunkaiTao requested a review from jeejeelee as a code owner February 5, 2026 00:39
@mergify mergify bot added the bug Something isn't working label Feb 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where naive_block_assignment was always defaulting to False due to a positional argument being misaligned. The fix involves changing the function call to use a keyword argument, which is the right approach. I've added a suggestion to make the function signature more robust to prevent similar issues in the future.

adapter_enabled: torch.Tensor,
expert_map: torch.Tensor | None = None,
pad_sorted_ids: bool = False,
naive_block_assignment: bool = False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This bug was caused by passing a positional argument that was misinterpreted. To prevent this class of bugs, consider making optional boolean flags keyword-only by adding a * in the function signature before them (e.g., before pad_sorted_ids). This would enforce keyword arguments for all subsequent parameters and would have raised a TypeError for the original buggy call.

Copy link
Copy Markdown
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @RunkaiTao

cc @jeejeelee

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor

cc @mgoin @robertgshaw2-redhat can you please take a look. Thanks 🙌

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 9, 2026
@DarkLight1337 DarkLight1337 merged commit e1d97c3 into vllm-project:main Feb 12, 2026
47 of 48 checks passed
warichet pushed a commit to warichet/vllm that referenced this pull request Feb 12, 2026
… to arg misalignment (vllm-project#33848)

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
… to arg misalignment (vllm-project#33848)

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
… to arg misalignment (vllm-project#33848)

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
… to arg misalignment (vllm-project#33848)

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants