Skip to content

Remove virtual engine handling#30350

Closed
WoosukKwon wants to merge 1 commit intomainfrom
codex/remove-virtual-engine-from-codebase
Closed

Remove virtual engine handling#30350
WoosukKwon wants to merge 1 commit intomainfrom
codex/remove-virtual-engine-from-codebase

Conversation

@WoosukKwon
Copy link
Copy Markdown
Collaborator

Summary

  • remove the virtual engine field from the forward context helpers
  • simplify KV cache access across attention layers and KV connectors to use a single cache instance
  • update KV connector tests to align with the new forward context interface

Testing

  • python -m pytest tests/v1/kv_connector/unit/test_lmcache_integration.py::test_forward_context_interface (fails: ModuleNotFoundError: tblib)
  • python -m compileall tests/v1/kv_connector/unit/test_nixl_connector.py
  • python -m compileall tests/v1/kv_connector/unit/test_offloading_connector.py tests/v1/kv_connector/unit/test_decode_bench_connector.py tests/v1/kv_connector/unit/test_lmcache_integration.py

Codex Task

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added qwen Related to Qwen models v1 tpu Related to Google TPUs labels Dec 9, 2025
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the virtual_engine field and its handling throughout the codebase. The changes simplify KV cache access by assuming a single cache instance instead of a list indexed by virtual_engine. The modifications are consistently applied across various components, including attention layers, KV connectors, model implementations, and tests. The refactoring is clean and aligns with the goal of simplifying the KV cache management. I have reviewed the changes and found no issues.

@XingLiu1
Copy link
Copy Markdown
Contributor

Hi maintainers, quick question about scope:

Does this PR also cover the TODO in
https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu/kv_connector.py#L72
("sort out KV Connectors' use of forward_context", introduced in #32742)?

Right now pre_forward() falls back to set_forward_context(None, ...) before
start_load_kv(get_forward_context()) when no forward context is active.
Should that behavior be considered resolved by this PR, or should it be tracked as a separate follow-up issue?

@WoosukKwon WoosukKwon closed this Feb 26, 2026
@WoosukKwon WoosukKwon deleted the codex/remove-virtual-engine-from-codebase branch February 26, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector needs-rebase qwen Related to Qwen models tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants