fix: DIS-706 skip offloading the G1 matched blocks during offloading #3299

richardhuo-nv · 2025-09-29T23:55:01Z

Overview:

There is a bug in TRTLLM KVBM offloading, that we might double offloading the G1 matched blocks again. This PR fixed this bug.

Details:

In the prefill stage, the scheduler output for a new request will always report 0 computed tokens; it only indicates the connector of the device blocks, and the prefill tokens.

Therefore, if the request is new and the computed position is greater than 0, but the slot’s evaluated blocks count is 0, it means there are matched tokens already present on the GPU.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Bug Fixes
- More reliable handling of new vs. cached requests during scheduling.
- Prevented negative candidate calculations, improving stability.
- Refined offloading decisions for more accurate memory management.
Performance Improvements
- Reduced unnecessary block evaluations for new requests, potentially lowering latency.
- Improved disk-backed pool initialization to avoid side effects and ensure smoother reuse of configuration.
Refactor
- Clarified onboarding logic for requests, improving readability and maintainability without changing user-facing behavior.

copy-pr-bot · 2025-09-29T23:55:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-29T23:57:03Z

Walkthrough

Updated Slot trait and VllmConnectorSlot method signatures to include an is_new_request flag, adjusted internal logic around evaluated_blocks and offloading decisions, set explicit true/false at two call sites distinguishing new vs cached requests, and changed disk pool layout construction to clone config before use.

Changes

Cohort / File(s)	Summary of Changes
Slot interface and implementation `lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs`	Added `is_new_request: bool` to `apply_scheduler_output_with_computed_position` in trait and impl; conditional advancement of `evaluated_blocks` when onboarding; switched to `saturating_sub` for `num_candidate_blocks`; offload condition uses `> 0` instead of `!= 0`.
Leader call sites (onboarding vs cached) `lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs`	Explicitly passed `true` for new requests and `false` for cached requests to `apply_scheduler_output_with_computed_position`, separating onboarding and cached paths.
Offload config handling `lib/llm/src/block_manager/offload.rs`	Passed `config.clone()` to disk-backed `build_layout`, avoiding moving the original config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Leader as KvConnectorLeader
  participant Slot as VllmConnectorSlot
  participant Scheduler

  Client->>Leader: build_connector_metadata(request)
  Leader->>Scheduler: get scheduler output
  alt New request
    note over Leader,Slot: is_new_request = true
    Leader->>Slot: apply_scheduler_output_with_computed_position(..., true)
    Slot->>Slot: Conditionally advance evaluated_blocks
    Slot->>Slot: Compute num_candidate_blocks via saturating_sub
    Slot->>Slot: Offload if num_candidate_blocks > 0
  else Cached request
    note over Leader,Slot: is_new_request = false
    Leader->>Slot: apply_scheduler_output_with_computed_position(..., false)
    Slot->>Slot: Skip onboarding advancement
    Slot->>Slot: Compute/decide offload as above
  end
  Leader-->>Client: connector metadata

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my ears at flags so new,
True for fresh hops, false for déjà vu.
Blocks advance where seedlings sprout,
Offloads nibble when carrots run out.
A cloned config in the burrow’s light—
Neat little trails, all routed right. 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	While the overview and details sections adequately describe the bug and its resolution, the “Where should the reviewer start?” section is left as the template placeholder and lacks any file references, and the related issue placeholder still uses “#xxx” instead of the actual issue number, so the description is incomplete.	Please fill in the “Where should the reviewer start?” section with specific file paths or components to review, and replace “#xxx” in the Related Issues section with the actual issue number the PR closes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly identifies the primary fix—skipping offloading of G1 matched blocks—and includes the relevant ticket reference, making it clear and focused on the core change without extraneous detail.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa3077 and cfb5f0a.

📒 Files selected for processing (3)

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (3 hunks)
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (2 hunks)
lib/llm/src/block_manager/offload.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

lib/llm/src/block_manager/offload.rs (1)

lib/llm/src/block_manager/layout.rs (3)

config (196-196)

config (509-511)

config (740-742)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: trtllm
GitHub Check: sglang
GitHub Check: vllm
GitHub Check: Build and Test - dynamo
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (.)

🔇 Additional comments (8)

lib/llm/src/block_manager/offload.rs (1)

742-742: LGTM! Cloning config enables subsequent pool constructions.

The change from config to config.clone() ensures the original config remains available after the disk pool layout is built, allowing it to be reused for any subsequent pool constructions. This prevents a move error and is the correct approach.

lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (2)

337-337: LGTM! Correctly marks new requests for onboarding.

Passing is_new_request = true when applying scheduler output to new onboarding requests is correct and aligns with the PR's goal of distinguishing G1 matched blocks from fresh allocations during the prefill stage.

368-368: LGTM! Correctly marks cached requests.

Passing is_new_request = false for cached requests ensures that the slot logic does not treat these as onboarding requests, preventing double offloading of already-cached blocks.

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (5)

118-119: LGTM! Trait signature updated to support new/cached request distinction.

Adding the is_new_request: bool parameter to the trait method enables implementations to distinguish between onboarding new requests and processing cached requests, which is essential for the offloading bugfix.

596-596: LGTM! Implementation signature matches updated trait.

The implementation correctly adopts the new is_new_request parameter from the trait definition.

644-644: LGTM! Using saturating_sub prevents underflow.

The change from direct subtraction to saturating_sub is safer and prevents potential underflow if evaluated_blocks somehow exceeds the computed block count. This is a defensive improvement.

646-646: LGTM! Positive check is clearer for offloading decision.

Changing the condition from != 0 to > 0 makes the intent clearer: we only offload when there are positive candidate blocks. Functionally equivalent for usize, but semantically more explicit.

635-641: Block-skipping formula is correct. computed_position is a zero‐based token index, and (computed_position + 1) / block_size correctly computes the number of fully computed blocks without off‐by‐one errors.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

Public repositories are always opted into early access features.
You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs

lib/llm/src/block_manager/offload.rs

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs

richardhuo-nv · 2025-09-30T19:48:29Z

/ok to test 9aa8230

fix fix fix fix fix fix fix Signed-off-by: richardhuo-nv <[email protected]>

Signed-off-by: richardhuo-nv <[email protected]>

…3299) Signed-off-by: richardhuo-nv <[email protected]>

richardhuo-nv requested a review from a team as a code owner September 29, 2025 23:55

pull-request-size bot added the size/S label Sep 29, 2025

github-actions bot added the fix label Sep 29, 2025

richardhuo-nv force-pushed the rihuo/fix_offload branch from 264871e to 70cc164 Compare September 29, 2025 23:56

copy-pr-bot bot temporarily deployed to GITLAB September 29, 2025 23:58 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 29, 2025 23:59 Inactive

nv-kmcgill53 reviewed Sep 30, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs Outdated Show resolved Hide resolved

nv-kmcgill53 reviewed Sep 30, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs Outdated Show resolved Hide resolved

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs Outdated Show resolved Hide resolved

ziqifan617 reviewed Sep 30, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs Outdated Show resolved Hide resolved

ziqifan617 reviewed Sep 30, 2025

View reviewed changes

lib/llm/src/block_manager/offload.rs Show resolved Hide resolved

ziqifan617 reviewed Sep 30, 2025

View reviewed changes

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs Show resolved Hide resolved

richardhuo-nv requested a review from ryanolson September 30, 2025 03:36

copy-pr-bot bot temporarily deployed to GITLAB September 30, 2025 16:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 30, 2025 16:51 Inactive

richardhuo-nv requested review from nv-kmcgill53 and ziqifan617 September 30, 2025 16:51

ziqifan617 approved these changes Sep 30, 2025

View reviewed changes

richardhuo-nv force-pushed the rihuo/fix_offload branch from 097d571 to 9aa8230 Compare September 30, 2025 18:29

richardhuo-nv added 3 commits September 30, 2025 14:40

fix offload

2b8dff1

fix fix fix fix fix fix fix Signed-off-by: richardhuo-nv <[email protected]>

fix

0c76085

Signed-off-by: richardhuo-nv <[email protected]>

resolve comments

6a6b3e6

Signed-off-by: richardhuo-nv <[email protected]>

richardhuo-nv force-pushed the rihuo/fix_offload branch from 9aa8230 to 6a6b3e6 Compare September 30, 2025 21:41

copy-pr-bot bot temporarily deployed to GITLAB September 30, 2025 21:41 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 30, 2025 21:46 Inactive

richardhuo-nv merged commit 713e9e4 into main Oct 1, 2025
31 of 36 checks passed

richardhuo-nv deleted the rihuo/fix_offload branch October 1, 2025 17:28

ziqifan617 pushed a commit that referenced this pull request Oct 1, 2025

fix: DIS-706 skip offloading the G1 matched blocks during offloading (#…

464d558

…3299) Signed-off-by: richardhuo-nv <[email protected]>

nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025

fix: DIS-706 skip offloading the G1 matched blocks during offloading (#…

606b7f3

…3299) Signed-off-by: richardhuo-nv <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: DIS-706 skip offloading the G1 matched blocks during offloading #3299

fix: DIS-706 skip offloading the G1 matched blocks during offloading #3299

Uh oh!

richardhuo-nv commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

richardhuo-nv commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: DIS-706 skip offloading the G1 matched blocks during offloading #3299

fix: DIS-706 skip offloading the G1 matched blocks during offloading #3299

Uh oh!

Conversation

richardhuo-nv commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

richardhuo-nv commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

richardhuo-nv commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading