Skip to content

feat: disable kv events in vLLM when lora is enabled#4128

Merged
biswapanda merged 6 commits intomainfrom
bis/lora-vllm-1
Nov 10, 2025
Merged

feat: disable kv events in vLLM when lora is enabled#4128
biswapanda merged 6 commits intomainfrom
bis/lora-vllm-1

Conversation

@biswapanda
Copy link
Copy Markdown
Contributor

@biswapanda biswapanda commented Nov 5, 2025

Overview:

Disable kv events in vLLM when lora is enabled.

There is a bug in the KV cache block storage system where the code was incorrectly trying to access request.lora_request.id instead of the correct request.lora_request.adapter_id property.

Bug is fixed in vllm-project/vllm#27728 but not released yet.

DEP-588

Details:

  • Fixed KV events with LoRA: Added upstream bug workaround that disables KV events when LoRA is enabled, preventing crashes until vLLM PR #27728 is released
  • Improved KV publisher setup: Added null check to prevent setup when kv_events_config is None

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes
    • Improved KV events publishing configuration handling to prevent incompatible feature combinations.
    • KV events publishing is now properly disabled when both prefix caching and LORA features are enabled simultaneously.
    • Added proper handling for cases where KV events configuration is not explicitly provided.

@biswapanda biswapanda self-assigned this Nov 5, 2025
@biswapanda biswapanda requested review from a team as code owners November 5, 2025 19:47
@biswapanda biswapanda changed the title [feat] disable kv events in vLLM when lora is enabled feat: disable kv events in vLLM when lora is enabled Nov 5, 2025
@github-actions github-actions bot added the feat label Nov 5, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

Two files in the vLLM integration add guard conditions to KV events publishing: one prevents publishing when LORA is enabled during prefix caching, and the other adds an early exit when no KV events config is provided.

Changes

Cohort / File(s) Summary
KV Events Configuration Guards
components/src/dynamo/vllm/args.py, components/src/dynamo/vllm/main.py
Added guard conditions to disable KV events publishing: first returns None from create_kv_events_config when LORA is enabled during prefix caching, with explanatory comments; second adds early exit in setup_kv_event_publisher when kv_events_config is None, bypassing publisher setup and associated logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Both changes are straightforward guard/early-exit patterns with localized scope
  • Check that the LORA+prefix-caching condition is correct and the None propagation doesn't mask legitimate configuration paths
  • Verify the early exit in setup_kv_event_publisher doesn't skip necessary initialization steps

Poem

🐰 A hop, skip, and guard we place,
When LORA meets cache's face—
We say "not now!" with early return,
Let KV publishers wait their turn,
With None as sentinel, we skip with grace! 🎯

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly matches the main change: adding logic to disable KV events when LoRA is enabled in vLLM, as evidenced by the new guard in create_kv_events_config that returns None when LORA is enabled.
Description check ✅ Passed The PR description covers the overview, details, and related issues sections but lacks specific guidance on where reviewers should start examining the code.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4765d88 and 5c8a8e0.

📒 Files selected for processing (2)
  • components/src/dynamo/vllm/args.py (1 hunks)
  • components/src/dynamo/vllm/main.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/vllm/main.py (1)

153-154: Early exit is correct but insufficient on its own.

The None check correctly prevents accessing .endpoint on a None value at line 170. However, this check alone doesn't protect against the case where a user provides --kv-events-config while LoRA is enabled, since that config would not be None.

As noted in the args.py review, consider adding an explicit LoRA check here as well:

     if config.is_decode_worker:
         logger.info("Skipping KV event publisher setup for decode worker")
         return None
 
+    # Skip KV events when LoRA is enabled due to upstream bug
+    if config.engine_args.enable_lora:
+        logger.info("Skipping KV event publisher setup due to LoRA being enabled")
+        return None
+
     if config.engine_args.kv_events_config is None:
         return None

This provides defense-in-depth and makes the intent clearer at the usage site.

components/src/dynamo/vllm/args.py (1)

387-391: The review comment conflates unrelated code paths. The LoRA workaround only affects kv_events_config created by create_kv_events_config, not external configs.

The consolidator_config.py and multimodal worker.py files use kv_events_config from independent sources, not from create_kv_events_config. The only code consuming the result of create_kv_events_config is components/src/dynamo/vllm/main.py (line 170), which properly guards access with an explicit None check (line 153-154).

The LoRA workaround (lines 387-391) correctly returns None only for the config it creates, preventing unsafe access in the calling code. Pre-existing unguarded accesses in other modules are not introduced by or related to these changes.

Likely an incorrect or invalid review comment.

@biswapanda biswapanda merged commit 7802f96 into main Nov 10, 2025
29 of 39 checks passed
@biswapanda biswapanda deleted the bis/lora-vllm-1 branch November 10, 2025 20:01
daiyaanarfeen pushed a commit that referenced this pull request Nov 14, 2025
Signed-off-by: Daiyaan <darfeen@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants