Skip to content

[Bugfix] Align block table for TRTLLM MLA edge-case#39324

Merged
benchislett merged 4 commits into
vllm-project:mainfrom
CentML:bugfix-block-table-size
May 6, 2026
Merged

[Bugfix] Align block table for TRTLLM MLA edge-case#39324
benchislett merged 4 commits into
vllm-project:mainfrom
CentML:bugfix-block-table-size

Conversation

@benchislett

Copy link
Copy Markdown
Member

Purpose

When running with unusual max-model-len, such as 9416, TRTLLM MLA crashes:

(Worker_TP0 pid=646061) ERROR 04-07 23:47:19 [multiproc_executor.py:932] ValueError: Expected block_num % (128 / block_size) == 0, got block_num=295 and block_size=32`

It seems like a straightforward fix to pad the block table's max_num_blocks by a block or two to avoid this case.

Testing

Ran Kimi K2.5 with the max-model-len and it works.

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett requested review from LucasWilkinson, MatthewBonanni and mgoin and removed request for WoosukKwon and njhill April 8, 2026 18:05
@mergify mergify Bot added v1 bug Something isn't working labels Apr 8, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces alignment logic for max_num_blocks in vllm/v1/worker/block_table.py and vllm/v1/worker/gpu/block_table.py to meet the requirements of specific attention backends. Feedback was provided regarding the use of magic numbers and code duplication across the two files. Additionally, the reviewer suggested simplifying redundant conditional checks and noted potential issues with the alignment formula if non-power-of-2 block sizes are used.

Comment thread vllm/v1/worker/block_table.py
Comment thread vllm/v1/worker/gpu/block_table.py Outdated
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett requested a review from WoosukKwon April 8, 2026 18:27
@mgoin

mgoin commented Apr 8, 2026

Copy link
Copy Markdown
Member

Do you think this is something we should add as an attribute to the attention backend? Not sure if this is clearly good to enforce across the board
Cc @LucasWilkinson @MatthewBonanni

@benchislett

Copy link
Copy Markdown
Member Author

Yeah I'm not sure how to handle this. Seems like a pretty uncommon edge case in the first place so it might not be worth making a big change to support it. But I'm not sure what the end-result impact is to padding the block table by a couple blocks.

@MatthewBonanni

MatthewBonanni commented Apr 13, 2026

Copy link
Copy Markdown
Member

CUTLASS_MLA has a similar requirement:

assert block_num % (128 / PAGE_SIZE) == 0

Unless there's significant demand for supporting arbitrary max_num_blocks (when the backend allows) then I'm supportive of always padding for the sake of simplicity

Comment thread vllm/v1/worker/gpu/block_table.py

@NickLucche NickLucche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benchislett @LucasWilkinson I guess we're happy with padding the block_table here?

@MatthewBonanni MatthewBonanni left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, happy to land once #39324 (comment) is addressed

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label May 4, 2026
@benchislett benchislett enabled auto-merge (squash) May 4, 2026 16:22
@benchislett benchislett merged commit 38e1667 into vllm-project:main May 6, 2026
59 of 62 checks passed
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants