[Bug Fix] Fix `naive_block_assignment` always defaulting to False due to arg misalignment by RunkaiTao · Pull Request #33848 · vllm-project/vllm

RunkaiTao · 2026-02-05T00:39:01Z

Purpose

Fix a bug that naive_block_assignment always defaulting to False due to arg misalignment.

Test Result

gpt-oss 120b max_loras=8, concurrency=1

before

============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  180.34    
Total input tokens:                      63226     
Total generated tokens:                  24000     
Request throughput (req/s):              0.22      
Output token throughput (tok/s):         133.08    
Peak output token throughput (tok/s):    138.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          483.66    
---------------Time to First Token----------------
Mean TTFT (ms):                          155.22    
Median TTFT (ms):                        139.08    
P99 TTFT (ms):                           521.81    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          7.27      
Median TPOT (ms):                        7.27      
P99 TPOT (ms):                           7.28      
---------------Inter-token Latency----------------
Mean ITL (ms):                           7.27      
Median ITL (ms):                         7.27      
P99 ITL (ms):                            7.65      
==================================================

after

============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             1         
Benchmark duration (s):                  139.22    
Total input tokens:                      63226     
Total generated tokens:                  24000     
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         172.39    
Peak output token throughput (tok/s):    180.00    
Peak concurrent requests:                2.00      
Total token throughput (tok/s):          626.53    
---------------Time to First Token----------------
Mean TTFT (ms):                          132.71    
Median TTFT (ms):                        132.05    
P99 TTFT (ms):                           161.04    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          5.59      
Median TPOT (ms):                        5.59      
P99 TPOT (ms):                           5.59      
---------------Inter-token Latency----------------
Mean ITL (ms):                           5.59      
Median ITL (ms):                         5.60      
P99 ITL (ms):                            6.06      
==================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

gemini-code-assist

Code Review

This pull request correctly fixes a bug where naive_block_assignment was always defaulting to False due to a positional argument being misaligned. The fix involves changing the function call to use a keyword argument, which is the right approach. I've added a suggestion to make the function signature more robust to prevent similar issues in the future.

gemini-code-assist · 2026-02-05T00:41:08Z

vllm/lora/punica_wrapper/punica_base.py

        adapter_enabled: torch.Tensor,
        expert_map: torch.Tensor | None = None,
        pad_sorted_ids: bool = False,
+        naive_block_assignment: bool = False,


This bug was caused by passing a positional argument that was misinterpreted. To prevent this class of bugs, consider making optional boolean flags keyword-only by adding a * in the function signature before them (e.g., before pad_sorted_ids). This would enforce keyword arguments for all subsequent parameters and would have raised a TypeError for the original buggy call.

varun-sundar-rabindranath

Thanks for the fix @RunkaiTao

cc @jeejeelee

varun-sundar-rabindranath · 2026-02-09T13:48:19Z

cc @mgoin @robertgshaw2-redhat can you please take a look. Thanks 🙌

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

unpermute bug fixing

ccfb4b7

Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

RunkaiTao requested a review from jeejeelee as a code owner February 5, 2026 00:39

mergify bot added the bug Something isn't working label Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

varun-sundar-rabindranath approved these changes Feb 5, 2026

View reviewed changes

jeejeelee approved these changes Feb 9, 2026

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 9, 2026

Merge remote-tracking branch 'upstream/main' into unpermute/fix

99d77c7

RunkaiTao mentioned this pull request Feb 11, 2026

[Fix Bug]num_active_loras always equals to zero #34119

Merged

5 tasks

DarkLight1337 merged commit e1d97c3 into vllm-project:main Feb 12, 2026
47 of 48 checks passed

warichet pushed a commit to warichet/vllm that referenced this pull request Feb 12, 2026

[Bug Fix] Fix naive_block_assignment always defaulting to False due…

486243e

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Bug Fix] Fix naive_block_assignment always defaulting to False due…

d458775

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Bug Fix] Fix naive_block_assignment always defaulting to False due…

4382330

… to arg misalignment (vllm-project#33848) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Fix] Fix `naive_block_assignment` always defaulting to False due to arg misalignment#33848

[Bug Fix] Fix `naive_block_assignment` always defaulting to False due to arg misalignment#33848
DarkLight1337 merged 2 commits intovllm-project:mainfrom
RunkaiTao:unpermute/fix

RunkaiTao commented Feb 5, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

varun-sundar-rabindranath left a comment

Uh oh!

varun-sundar-rabindranath commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

RunkaiTao commented Feb 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RunkaiTao commented Feb 5, 2026 •

edited by github-actions bot

Loading