-
Notifications
You must be signed in to change notification settings - Fork 692
feat: add KVBM host to disk metrics | clean up dashboard #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test 292ebb4 |
WalkthroughRefactors KVBM metrics: removes request counters, adds block-level offload metric (H2D), wires optional KvbmMetrics through builders/configs into OffloadManager, and increments during offload. Adjusts transfer strategy (Disk→Device uses Write). Updates Prometheus names and Grafana dashboard accordingly. Documentation switches example models to Qwen/Qwen3-0.6B. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Leader
participant Builder as BlockManagerBuilder
participant BMConfig as KvBlockManagerConfig
participant State as KvBlockManagerState
participant OffCfg as OffloadManagerConfig
participant OffMgr as OffloadManager
Leader->>Builder: new()
Leader->>Builder: kvbm_metrics(kvbm_metrics.clone())
Builder->>BMConfig: build() with kvbm_metrics: Some(...)
BMConfig-->>Leader: config
Leader->>State: init(config)
State->>OffCfg: new(..., kvbm_metrics: config.kvbm_metrics.clone())
State->>OffMgr: OffloadManager::new(OffCfg)
OffMgr-->>State: instance ready
sequenceDiagram
autonumber
participant Caller as BlockManager
participant OffMgr as OffloadManager
participant HostTx as host_offload_tx
note over Caller,OffMgr: Offload path with metrics
Caller->>OffMgr: offload(block: Host)
alt kvbm_metrics is Some
OffMgr->>OffMgr: offload_blocks_h2d.inc()
else no metrics
OffMgr-->>OffMgr: skip
end
OffMgr->>HostTx: send(request)
HostTx-->>OffMgr: ack
note over Caller,OffMgr: Prior request counters removed in slot.rs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
richardhuo-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
WalkthroughUpdates Grafana dashboard layout/titles/refresh; replaces model references in docs to Qwen/Qwen3-0.6B; adds kvbm_metrics plumbing through builders/config/state into OffloadManager; increments new offload_blocks_h2d metric on host-to-disk offload; removes request counters; adjusts Prometheus constant names; changes one transfer strategy mapping. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Leader as Connector Leader
participant Bldr as BlockManagerBuilder
participant Cfg as KvBlockManagerConfig
participant State as KvBlockManagerState
participant OffMan as OffloadManager
note over Client,Leader: Initialization and request handling (metrics wiring)
Client->>Leader: initialize(...)
Leader->>Bldr: kvbm_metrics(metrics.clone())
Bldr->>Cfg: build() with kvbm_metrics
Cfg-->>State: config(kvbm_metrics)
State->>OffMan: new(OffloadManagerConfig{ kvbm_metrics })
note over OffMan: Holds optional KvbmMetrics
Client->>Leader: offload host block
Leader->>OffMan: offload(host->disk)
alt host->disk
OffMan->>OffMan: kvbm_metrics.offload_blocks_h2d.inc()
end
sequenceDiagram
autonumber
participant Disk as DiskStorage
participant Dev as DeviceStorage
participant Strat as WriteToStrategy
Disk->>Strat: write_to(DeviceStorage)
note over Strat: Mapping changed
Strat-->>Disk: TransferStrategy::Nixl(Write)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/runtime/src/metrics/prometheus_names.rs (1)
326-331: Add OFFLOAD_BLOCKS_H2D to Python bindings
The new metric must be mirrored inlib/bindings/python/rust/prometheus_names.rs. Add:/// The number of offload blocks from host to disk pub const OFFLOAD_BLOCKS_H2D: &str = "offload_blocks_h2d";
🧹 Nitpick comments (2)
lib/llm/src/block_manager/offload.rs (2)
495-499: Consider adding test coverage for the new metric.The test suite in this file doesn't appear to verify that
offload_blocks_h2dis incremented correctly during host-to-disk offloads. Adding a test case would help ensure the metric works as expected and prevent regressions.
495-499: Differentiate initiated vs completed offload metrics
The existingoffload_blocks_h2dcounter (lib/llm/src/block_manager/offload.rs:495–499) fires on enqueue. To capture actual transfer outcomes, add aoffload_blocks_completed_total{status="success"|"failure"}counter in the offload worker after transfer, and consider a gauge (e.g.offload_blocks_in_progress) for in-flight operations.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json(12 hunks)docs/guides/run_kvbm_in_trtllm.md(5 hunks)docs/guides/run_kvbm_in_vllm.md(4 hunks)lib/bindings/python/rust/llm/block_manager.rs(3 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs(0 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs(1 hunks)lib/llm/src/block_manager/block/transfer/strategy.rs(1 hunks)lib/llm/src/block_manager/config.rs(1 hunks)lib/llm/src/block_manager/metrics_kvbm.rs(5 hunks)lib/llm/src/block_manager/offload.rs(4 hunks)lib/llm/src/block_manager/state.rs(2 hunks)lib/runtime/src/metrics/prometheus_names.rs(1 hunks)
💤 Files with no reviewable changes (1)
- lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs
🧰 Additional context used
🧬 Code graph analysis (3)
lib/llm/src/block_manager/state.rs (1)
lib/llm/src/block_manager/block/data/logical.rs (1)
resources(69-71)
lib/llm/src/block_manager/metrics_kvbm.rs (1)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (1)
IntCounter(126-142)
lib/llm/src/block_manager/offload.rs (1)
lib/bindings/python/rust/llm/block_manager.rs (1)
kvbm_metrics(246-252)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: clippy (lib/runtime/examples)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: clippy (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
lib/llm/src/block_manager/state.rs (1)
149-156: LGTM! Clean kvbm_metrics propagation.The kvbm_metrics field is correctly propagated from
resources.configintoOffloadManagerConfigin both the logical and local locality initialization paths. The clone operation is appropriate for the Optional type.Also applies to: 265-272
lib/bindings/python/rust/llm/block_manager.rs (1)
219-219: LGTM! Proper builder pattern implementation.The kvbm_metrics integration follows the standard builder pattern correctly:
- Optional field in the builder struct
- Chainable setter method
- Conditional propagation during build
Also applies to: 246-252, 312-316
lib/llm/src/block_manager/offload.rs (1)
77-78: LGTM! Proper kvbm_metrics plumbing.The kvbm_metrics field is correctly added to both config and manager structs, with proper initialization and optional handling throughout.
Also applies to: 101-102, 129-129
|
/ok to test 134588e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json(12 hunks)docs/guides/run_kvbm_in_trtllm.md(5 hunks)docs/guides/run_kvbm_in_vllm.md(4 hunks)lib/bindings/python/rust/llm/block_manager.rs(3 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs(0 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs(1 hunks)lib/llm/src/block_manager/block/transfer/strategy.rs(1 hunks)lib/llm/src/block_manager/config.rs(1 hunks)lib/llm/src/block_manager/metrics_kvbm.rs(5 hunks)lib/llm/src/block_manager/offload.rs(4 hunks)lib/llm/src/block_manager/state.rs(2 hunks)lib/runtime/src/metrics/prometheus_names.rs(1 hunks)
💤 Files with no reviewable changes (1)
- lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/block_manager/offload.rs (1)
lib/bindings/python/rust/llm/block_manager.rs (1)
kvbm_metrics(246-252)
lib/llm/src/block_manager/metrics_kvbm.rs (1)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (1)
IntCounter(126-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (lib/runtime/examples)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: clippy (launch/dynamo-run)
🔇 Additional comments (18)
lib/llm/src/block_manager/metrics_kvbm.rs (4)
23-24: LGTM! Metric naming and description are clear.The new
offload_blocks_h2dfield accurately represents the host-to-disk offload metric with a clear description.
50-56: LGTM! Metric registration is correct.The metric registration for
offload_blocks_h2dis properly implemented with consistent naming and description.
76-84: LGTM! Early return path is complete.The early return when
create_endpointis false correctly includes all metric fields, including the newoffload_blocks_h2d.
130-137: LGTM! Final struct construction is consistent.The final
KvbmMetricsconstruction properly includesoffload_blocks_h2dand excludes the removed request counters.lib/llm/src/block_manager/block/transfer/strategy.rs (1)
93-98: LGTM! Transfer strategy change aligns with offload semantics.Changing from
NixlTransfer::ReadtoNixlTransfer::Writefor DeviceStorage to DiskStorage is correct. When offloading from device to disk, the operation is initiated from the device (source) side, so usingWriteis semantically accurate.Note: The test at line 223-226 correctly asserts this new behavior.
lib/bindings/python/rust/llm/block_manager/vllm/connector/trtllm_leader.rs (1)
103-109: LGTM! Metrics wiring is correct.The KvbmMetrics is properly wired into the BlockManagerBuilder during initialization, enabling metrics collection for the KvConnectorLeader.
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (1)
127-134: LGTM! Metrics wiring is consistent.The KvbmMetrics wiring matches the pattern used in trtllm_leader.rs, ensuring consistent metrics collection across different connector implementations.
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
143-150: LGTM! Metrics wiring is properly implemented.The recorder variant also correctly wires KvbmMetrics into the BlockManagerBuilder, maintaining consistency across all connector implementations.
lib/llm/src/block_manager/config.rs (1)
198-201: LGTM! Config field is well-designed.The optional
kvbm_metricsfield allows per-block-manager metrics tracking with good backwards compatibility. The documentation clearly explains its purpose.lib/llm/src/block_manager/state.rs (2)
149-156: LGTM! Metrics propagation for logical locality.The
kvbm_metricsis correctly cloned and propagated into theOffloadManagerConfig, enabling metrics collection for logical locality operations.
265-272: LGTM! Metrics propagation for local locality.The metrics propagation pattern is consistent with the logical locality implementation, ensuring uniform metrics collection across different locality types.
docs/guides/run_kvbm_in_trtllm.md (1)
81-82: LGTM! Documentation updated with consistent model references.All model references have been consistently updated from
deepseek-ai/DeepSeek-R1-Distill-Llama-8BtoQwen/Qwen3-0.6Bacross the guide. The smaller model size is likely better suited for demonstration and testing purposes.Also applies to: 87-87, 102-102, 116-117, 136-136, 158-158
lib/bindings/python/rust/llm/block_manager.rs (1)
219-219: LGTM! Clean builder pattern implementation.The kvbm_metrics field integration follows the established builder pattern consistently. The optional field, builder method, and config propagation are all properly implemented.
Also applies to: 246-252, 312-316
lib/runtime/src/metrics/prometheus_names.rs (1)
326-327: LGTM! Metric naming follows conventions.The new
OFFLOAD_BLOCKS_H2Dconstant aligns with the shift to block-level metrics and follows the established naming pattern. The separation of H2D (host-to-disk) and D2H (device-to-host) metrics provides clearer observability of the offload pipeline.lib/llm/src/block_manager/offload.rs (1)
77-78: LGTM! Metrics properly wired through config.The kvbm_metrics field is correctly added to both
OffloadManagerConfigandOffloadManager, and properly propagated during initialization. The optional field design allows metrics to be disabled when not needed.Also applies to: 101-102, 129-129
deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (2)
231-231: LGTM! Dashboard metrics align with code changes.The Grafana dashboard correctly updates metric expressions and panel titles to reflect the shift from request-based to block-based tracking:
kvbm_offload_blocks_d2hfor Device to Host offloadskvbm_offload_blocks_h2dfor Host to Disk offloadsThe panel titles clearly communicate the transfer direction, improving observability.
Also applies to: 240-240, 327-327, 336-336
546-546: Good: Explicit refresh rate improves consistency.Setting the refresh interval to
"5s"instead of"auto"ensures consistent dashboard behavior across different Grafana deployments and user preferences.docs/guides/run_kvbm_in_vllm.md (1)
83-83: Model availability confirmed.Qwen/Qwen3-0.6Bis publicly accessible on Hugging Face; no changes needed.
|
/ok to test f7047cb |
Signed-off-by: Ziqi Fan <[email protected]>
Signed-off-by: Ziqi Fan <[email protected]>
Signed-off-by: Ziqi Fan <[email protected]>
Signed-off-by: Ziqi Fan <[email protected]>
f77d85f to
b4858c2
Compare
|
/ok to test b4858c2 |
Signed-off-by: Ziqi Fan <[email protected]>
Signed-off-by: Ziqi Fan <[email protected]>
Overview:
Summary by CodeRabbit
New Features
Refactor
Documentation