feat: disable kv events in vLLM when lora is enabled#4128
Conversation
WalkthroughTwo files in the vLLM integration add guard conditions to KV events publishing: one prevents publishing when LORA is enabled during prefix caching, and the other adds an early exit when no KV events config is provided. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/vllm/args.py(1 hunks)components/src/dynamo/vllm/main.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/vllm/main.py (1)
153-154: Early exit is correct but insufficient on its own.The None check correctly prevents accessing
.endpointon a None value at line 170. However, this check alone doesn't protect against the case where a user provides--kv-events-configwhile LoRA is enabled, since that config would not be None.As noted in the
args.pyreview, consider adding an explicit LoRA check here as well:if config.is_decode_worker: logger.info("Skipping KV event publisher setup for decode worker") return None + # Skip KV events when LoRA is enabled due to upstream bug + if config.engine_args.enable_lora: + logger.info("Skipping KV event publisher setup due to LoRA being enabled") + return None + if config.engine_args.kv_events_config is None: return NoneThis provides defense-in-depth and makes the intent clearer at the usage site.
components/src/dynamo/vllm/args.py (1)
387-391: The review comment conflates unrelated code paths. The LoRA workaround only affects kv_events_config created bycreate_kv_events_config, not external configs.The
consolidator_config.pyand multimodalworker.pyfiles use kv_events_config from independent sources, not fromcreate_kv_events_config. The only code consuming the result ofcreate_kv_events_configiscomponents/src/dynamo/vllm/main.py(line 170), which properly guards access with an explicit None check (line 153-154).The LoRA workaround (lines 387-391) correctly returns None only for the config it creates, preventing unsafe access in the calling code. Pre-existing unguarded accesses in other modules are not introduced by or related to these changes.
Likely an incorrect or invalid review comment.
eb709a1 to
f68bc7e
Compare
d6e8e70 to
f8f8605
Compare
Signed-off-by: Daiyaan <darfeen@nvidia.com>
Overview:
Disable kv events in vLLM when lora is enabled.
There is a bug in the KV cache block storage system where the code was incorrectly trying to access request.lora_request.id instead of the correct request.lora_request.adapter_id property.
Bug is fixed in vllm-project/vllm#27728 but not released yet.
DEP-588
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit