Skip to content

Conversation

@NickLucche
Copy link
Collaborator

Small quality of life change to keep the way we interact with KV cache layout consistent as I explain a bit here #22735.
cc @zhenwei-intel to keep me true on the XPU-related change. PS that limit will prevent XPU from being compatible with heteroTP as of now.

cc @LucasWilkinson as this is another kv layout constraint which is good to keep in mind.

Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of KV cache layout settings to be more consistent, especially for XPU which requires the 'NHD' layout. It introduces a centralized set_kv_cache_layout function, which is an improvement over direct environment variable manipulation. My review includes a suggestion to make the validation of user-provided layouts more robust. A significant part of this PR is the addition of the vllm/entrypoints/cli/coordinate.py file, which appears to be a refactoring of the server startup logic, though it's not mentioned in the PR description.

Signed-off-by: NickLucche <[email protected]>
Comment on lines 33 to 34
KVCacheLayoutType = Literal["NHD", "HND"]
_KV_CACHE_LAYOUT_OVERRIDE: KVCacheLayoutType | None = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucasWilkinson any other meaningful layout we want to allow here?


# Supported xPUs and types of kv transfer buffer.
# {xPU: tuple of supported kv buffer types}
_NIXL_SUPPORTED_XPUS = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xPU was a great name for a generic PU, but since Intel already claimed it this is just confusing now >.<

@zhenwei-intel
Copy link
Contributor

LGTM

Signed-off-by: NickLucche <[email protected]>
Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing. LGTM!

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 15, 2025
@jikunshang jikunshang merged commit 2e41f5a into vllm-project:main Sep 15, 2025
58 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants