chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry #3350

grahamking · 2025-10-01T19:01:49Z

Those two always overlapped, now there's only one. Allows us to remove ModelEntry and ModelNetworkName.

We also attach the lease in etcd's KeyValueStore impl, and now the etcd is fully cleaned up when a process stops.

Summary by CodeRabbit

New Features
- Model registrations now include a model-card checksum for stronger integrity and validation.
- Persistent checksum caching and a public model name accessor for deployment cards.
Refactor
- Discovery, watchers and routers migrated to a deployment-card-centric flow with a unified root prefix.
- Model publication streamlined to lease-bound key publishing.
Tests
- Tests updated to use deployment cards and checksum-aware registration.
Chores
- Removed deprecated public types/constants and cleaned up imports.

coderabbitai · 2025-10-01T19:10:47Z

Walkthrough

ModelEntry and MODEL_ROOT_PATH were removed. Discovery/watcher switched to ModelDeploymentCard and derives EndpointId from etcd keys. ModelDeploymentCard gained name() and checksum caching; move_from_nats became public. ModelManager and model registration APIs now carry a per-card checksum. Entrypoints, routers, tests, and runtime utilities use model_card::ROOT_PATH ("mdc/").

Changes

Cohort / File(s)	Summary of Changes
Discovery API cleanup `lib/llm/src/discovery.rs`, `lib/llm/src/discovery/model_entry.rs`	Removed `model_entry` module, `ModelEntry` type, and `MODEL_ROOT_PATH` public constant.
Watcher → MDC-centric `lib/llm/src/discovery/watcher.rs`	Replaced ModelEntry usage with `ModelDeploymentCard`; extract `EndpointId` from etcd keys (`etcd_key_extract`); renamed/added methods (`all_cards`, `cards_for_model`), updated `handle_put` signature and registration flows.
Model card core `lib/llm/src/model_card.rs`	Added `name()` and cached MDC checksum (`OnceLock<String>` + `mdcsum()`), made `move_from_nats` public and async (returns `Result<()>`), initialized checksum in constructors.
Model manager / checksums `lib/llm/src/discovery/model_manager.rs`	Added checksum tracking to `ModelEngines`; `save_model_card` accepts `ModelDeploymentCard`; added `is_valid_checksum`; engine add/remove methods now accept and store per-model checksum; public add_*_model signatures updated.
Entrypoints (watcher & registrations) `lib/llm/src/entrypoint/input/common.rs`, `lib/llm/src/entrypoint/input/grpc.rs`, `lib/llm/src/entrypoint/input/http.rs`	Switched watcher prefix from `MODEL_ROOT_PATH` to `model_card::ROOT_PATH`; threaded card.mdcsum() into model registration calls (add_*_model) across static/local paths; imports updated to `model_card::{self, ModelDeploymentCard}`.
KV router & typed watcher docs `lib/llm/src/kv_router.rs`, `lib/runtime/src/utils/typed_prefix_watcher.rs`	Watch prefix changed to `model_card::ROOT_PATH`/`"mdc/"`; typed watcher example and extraction closure now reference `ModelDeploymentCard`.
Local model & network name removal `lib/llm/src/local_model.rs`, `lib/llm/src/local_model/network_name.rs`	Removed `ModelNetworkName` and its export; simplified local publish to use lease-based endpoint path and publish `ModelDeploymentCard` only (no ModelEntry).
Worker monitor runtime change `lib/runtime/src/utils/worker_monitor.rs`	Removed `MODEL_ROOT_PATH` and `ModelEntry` deserialization; switched to `"mdc/"` literal and parsing via `serde_json::Value` to extract runtime_kv_blocks.
Storage: etcd lease-aware puts `lib/runtime/src/storage/key_value_store/etcd.rs`	`create` and `update` PutOptions now attach primary lease (`with_lease(...)`) and `update` uses `with_prev_key()` to return previous key.
Tests & bindings updated `lib/llm/tests/*`, `lib/bindings/python/rust/http.rs`	Tests updated to create/use `ModelDeploymentCard` and pass `card.mdcsum()` to add_*_model calls; Python bindings for HttpService now require checksum parameter in `add_completions_model` and `add_chat_completions_model`.
Misc. internals / small changes `lib/llm/src/model_type.rs`, `lib/llm/src/preprocessor.rs`, `lib/llm/src/common/checked_file.rs`, `lib/llm/src/discovery/watcher.rs` (imports/tests)	Added `ModelType::units()`; explicit `mdcsum().to_string()` ownership in preprocessor; `Checksum` default and enum `Copy` derive; various imports adjusted to `ModelDeploymentCard`/`EndpointId` and updated tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Etcd as Etcd KV ("mdc/*")
  participant Watcher as ModelWatcher
  participant Card as ModelDeploymentCard
  participant NATS as NATS
  participant Manager as ModelManager / Engines

  Etcd->>Watcher: PUT(key="mdc/{endpoint}/...", value=card_json)
  Note right of Watcher: Extract EndpointId from key string
  Watcher->>Watcher: etcd_key_extract(key) --> (EndpointId, name)
  Watcher->>Card: Deserialize JSON -> ModelDeploymentCard
  Watcher->>Card: await card.move_from_nats(nats_client)  -- fetch artifacts (async)
  Card-->>Watcher: populate cache_dir, checksum cached
  Watcher->>Manager: save_model_card(key, card.clone())
  Manager->>Manager: store checksum and wire engines using card.name()
  Manager-->>Watcher: registration ack
  Watcher-->>Etcd: complete handling

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hop through "mdc/" lanes tonight,
Cards in my paws, checksum held tight.
Keys give endpoints, NATS brings the store,
Engines awaken, wires hum once more.
A rabbit’s nose twitches: new names take flight. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description does not follow the repository’s required template and is missing all mandatory sections, including Overview, Details, Where should the reviewer start, and Related Issues. Without these headings, reviewers cannot easily identify the purpose, scope, or starting point for the changes, nor see any linked issues.	Please update the description using the provided template by adding an Overview summarizing the change, a Details section explaining the modifications, a “Where should the reviewer start” section to highlight key files, and a Related Issues section with linked issue numbers.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.13% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely summarizes the primary change in the discovery module by indicating that ModelDeploymentCard is now watched and published instead of ModelEntry. It uses the conventional “chore(discovery)” prefix and accurately reflects the core update without extraneous detail. The phrasing is specific enough for teammates to understand the main purpose at a glance.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

lib/llm/src/model_card.rs (1)
298-355: Compile-time bug: wrong argument type to TempDir::with_prefix.

with_prefix expects &str; passing String won’t compile. Also prefer using the Slug’s AsRef<str> impl.

Apply:
-        let bucket_name = self.slug();
-        let target_dir = tempfile::TempDir::with_prefix(bucket_name.to_string())?;
+        let bucket_name = self.slug();
+        let target_dir = tempfile::TempDir::with_prefix(bucket_name.as_ref())?;
The rest of the function looks sound; keeping the Arc<TempDir> in cache_dir ensures the directory’s lifetime. Consider a tiny accessor (e.g., cache_path()) later if callers need the path.
lib/llm/src/discovery/watcher.rs (2)
107-120: Log strings still say “model entry”.

We now deserialize a ModelDeploymentCard; update logs to avoid confusion.
- tracing::error!(%err, value, "Invalid JSON in model entry")
+ tracing::error!(%err, value, "Invalid JSON in model card")
And similarly for the UTF‑8 branch.

444-469: Bug: all_cards likely misses versioned keys.

Cards are written under keys like v1/mdc/...; querying prefix "mdc" won’t match those. This can cause false “last instance” detections and premature engine removal.

Apply:
-        let kvs = etcd_client.kv_get_prefix(model_card::ROOT_PATH).await?;
-        let mut cards = Vec::with_capacity(kvs.len());
+        let v1_prefix = format!("v1/{}/", model_card::ROOT_PATH);
+        let raw_prefix = format!("{}/", model_card::ROOT_PATH);
+        let mut kvs = etcd_client.kv_get_prefix(&v1_prefix).await?;
+        // Also accept unversioned keys for compatibility.
+        kvs.extend(etcd_client.kv_get_prefix(&raw_prefix).await?);
+        let mut cards = Vec::with_capacity(kvs.len());
Also update log wording in this block from “model entry” to “model card”.

🧹 Nitpick comments (5)

lib/runtime/src/utils/worker_monitor.rs (1)
96-100: Consider using ModelDeploymentCard for type safety.

The code uses raw JSON parsing (serde_json::Value) instead of deserializing to ModelDeploymentCard like other files in this PR (kv_router.rs line 252). This loses compile-time type safety for the runtime_config field access.

Is there a specific reason to avoid using ModelDeploymentCard here? If not, consider using the typed approach for better maintainability:
-            |card: serde_json::Value| {
-                card.get("runtime_config")
-                    .and_then(|rc| rc.get("total_kv_blocks"))
-                    .and_then(|t_kv| t_kv.as_u64())
+            |card: ModelDeploymentCard| {
+                card.runtime_config.total_kv_blocks
             },
If ModelDeploymentCard is not accessible from the runtime crate or if there are other constraints preventing this approach, please clarify so we can document the rationale.
lib/llm/src/model_card.rs (1)

414-421: Behavior change: load_from_store no longer localizes artifacts.

Previously this populated cache_dir by pulling from NATS; now it returns an in‑store card. Ensure every callsite invokes card.move_from_nats(...) before using tokenizer/config files (the watcher does).

If any non-watcher path still calls load_from_store, I can scan and patch those to fetch from NATS. Want a follow-up PR?
lib/llm/src/discovery/watcher.rs (3)
195-199: Potential namespace mixing on delete.

cards_for_model(model_name) counts across all namespaces; engines are keyed by name() only. Verify intended semantics: should removal be gated by instances in the same namespace, or truly global?

If you want namespace scoping, parse the namespace from key and filter:
- let active_instances = self.cards_for_model(&model_name).await?;
+ let ns = etcd_key_to_endpoint_id(key)?.namespace;
+ let active_instances = self.cards_for_model_in_namespace(&model_name, &ns).await?;
I can draft cards_for_model_in_namespace if desired.

471-478: Optional: filter by namespace here if delete semantics are per-namespace.

If you adopt namespace scoping, add a variant that filters on parsed EndpointId.namespace.

I can implement cards_for_model_in_namespace() and its call-site changes if you confirm desired behavior.

514-538: Tests cover both versioned/unversioned prefixes.

Consider adding a case without an instance-id segment (exact 4 parts) to assert acceptance, and a negative case with wrong root. I can add these if wanted.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f63a05 and 00dcf6f.

📒 Files selected for processing (13)

lib/llm/src/discovery.rs (0 hunks)
lib/llm/src/discovery/model_entry.rs (0 hunks)
lib/llm/src/discovery/watcher.rs (19 hunks)
lib/llm/src/entrypoint/input/common.rs (2 hunks)
lib/llm/src/entrypoint/input/grpc.rs (2 hunks)
lib/llm/src/entrypoint/input/http.rs (2 hunks)
lib/llm/src/kv_router.rs (2 hunks)
lib/llm/src/local_model.rs (1 hunks)
lib/llm/src/local_model/network_name.rs (0 hunks)
lib/llm/src/model_card.rs (4 hunks)
lib/llm/tests/http_metrics.rs (4 hunks)
lib/runtime/src/utils/typed_prefix_watcher.rs (1 hunks)
lib/runtime/src/utils/worker_monitor.rs (1 hunks)

💤 Files with no reviewable changes (3)

lib/llm/src/local_model/network_name.rs
lib/llm/src/discovery/model_entry.rs
lib/llm/src/discovery.rs

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/runtime/src/utils/worker_monitor.rs

🧬 Code graph analysis (4)

lib/runtime/src/utils/worker_monitor.rs (3)

lib/runtime/src/utils/typed_prefix_watcher.rs (1)

lease_id (211-213)

lib/llm/src/local_model.rs (1)

card (331-333)

lib/llm/src/discovery/watcher.rs (2)

serde_json (107-107)

serde_json (452-452)

lib/llm/src/entrypoint/input/common.rs (2)

lib/runtime/src/transports/etcd.rs (1)

etcd_client (131-133)

lib/runtime/src/distributed.rs (1)

etcd_client (269-271)

lib/llm/src/kv_router.rs (3)

lib/bindings/python/src/dynamo/_core.pyi (2)

ModelDeploymentCard (445-450)

lease_id (138-142)

lib/runtime/src/utils/typed_prefix_watcher.rs (1)

lease_id (211-213)

lib/llm/src/local_model.rs (1)

card (331-333)

lib/llm/src/local_model.rs (3)

lib/runtime/src/transports/etcd.rs (1)

lease_id (136-138)

lib/runtime/src/component.rs (1)

endpoint (236-243)

lib/runtime/src/storage/key_value_store.rs (1)

from_raw (36-38)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: clippy (.)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (12)

lib/llm/src/entrypoint/input/http.rs (1)

7-7: LGTM! Clean migration to model_card::ROOT_PATH.

The changes correctly update the import and reference the etcd root path from the new model_card module instead of the removed discovery::MODEL_ROOT_PATH. The logic remains unchanged.

Also applies to: 13-13, 78-78

lib/llm/src/entrypoint/input/grpc.rs (1)

7-7: LGTM! Consistent migration to model_card::ROOT_PATH.

The changes mirror those in http.rs, correctly updating the import and using the new path constant from the model_card module.

Also applies to: 12-12, 50-50

lib/runtime/src/utils/typed_prefix_watcher.rs (1)

70-76: LGTM! Documentation updated to reflect ModelDeploymentCard migration.

The example code correctly demonstrates the new API using ModelDeploymentCard instead of ModelEntry and the updated etcd prefix "mdc/".

lib/llm/src/kv_router.rs (1)

47-47: LGTM! Clean migration to ModelDeploymentCard.

The changes correctly:

Import ModelDeploymentCard from the model_card module

Update the etcd prefix to model_card::ROOT_PATH

Update the value extractor closure to work with ModelDeploymentCard instead of ModelEntry

The extraction logic remains functionally identical, accessing the runtime_config field.

Also applies to: 250-253

lib/llm/src/model_card.rs (1)

192-196: Accessor name() is a clean addition.

Simple, correct alias to display_name; helps callsites drop field coupling.

lib/llm/src/discovery/watcher.rs (7)

16-16: Import changes look correct.

EndpointId and Annotated are the right additions for the refactor.

24-24: MDC-centric import is correct.

Switch to model_card::{self, ModelDeploymentCard} is aligned with the PR goal.

40-40: ModelManager path change is fine.

150-168: PUT handling flow looks correct.

Key → EndpointId → optional namespace filter → handle_put path is clean.

259-266: Good: localize artifacts before wiring pipelines.

Calling card.move_from_nats(self.drt.nats_client()).await? up front prevents downstream file I/O surprises.

274-282: Early exit on additional endpoints is sound.

Avoids rebuilding engines when a new instance appears for an existing model.

396-397: API assumption: Backend::from_mdc(card) takes &ModelDeploymentCard.

You’re passing &mut ModelDeploymentCard; coercion to &T is fine if the signature is &ModelDeploymentCard. Please confirm the signature.

If it consumes by value, clone: Backend::from_mdc(&card.clone()).

lib/llm/src/discovery/watcher.rs

lib/llm/src/local_model.rs

lib/llm/tests/http_metrics.rs

lib/runtime/src/utils/worker_monitor.rs

lib/llm/src/discovery/model_manager.rs

…ntry Those two always overlapped, now there's only one. Allows us to remove ModelEntry and ModelNetworkName. The next and hopefully final step will be to attach the card to an etcd lease for auto-cleanup. Signed-off-by: Graham King <[email protected]>

Signed-off-by: Graham King <[email protected]>

This cleans up the ModelDeploymentCard on shutdown Signed-off-by: Graham King <[email protected]>

If we are already serving a model and a new worker arrives with a different model card, claiming to be for the same model, we now don't know which is the correct model. Log an error. Ideally we would stop using that instance, but that happens in a different crate. It's not clear how we can communicate with it, see TODO. Signed-off-by: Graham King <[email protected]>

grahamking · 2025-10-06T17:56:19Z

@coderabbitai review please

coderabbitai · 2025-10-06T17:56:59Z

@grahamking: I'll review the changes in this PR for you.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

lib/llm/src/entrypoint/input/http.rs (1)

244-246: Keep the etcd watch handle alive; dropping it may cancel the watch

dissolve() returns a watch handle; binding it to _watcher and dropping it can stop events. Keep a guard in scope to ensure the watch remains active.

Apply this diff:
-    let (_prefix, _watcher, receiver) = models_watcher.dissolve();
+    let (_prefix, watcher, receiver) = models_watcher.dissolve();
+    // Keep the watch handle alive for the lifetime of this task
+    let _watcher_guard = watcher;

lib/llm/src/discovery/model_manager.rs (2)

109-112: Bug: has_model_any ignores embeddings/tensor models.

This causes handle_put to rebuild/register existing embeddings/tensor models, leading to ModelAlreadyExists errors and noisy logs.

Apply:

 pub fn has_model_any(&self, model: &str) -> bool {
     self.chat_completion_engines.read().contains(model)
-        || self.completion_engines.read().contains(model)
+        || self.completion_engines.read().contains(model)
+        || self.embeddings_engines.read().contains(model)
+        || self.tensor_engines.read().contains(model)
 }

289-304: Move/borrow errors with kv_router_config; used after move and moved twice.

unwrap_or_default() moves the Option; it’s used again later.
The Option is also passed into two constructors; needs cloning.

Apply:

-        etcd_client
-            .kv_create(
-                &router_key,
-                serde_json::to_vec_pretty(&kv_router_config.unwrap_or_default())?,
-                None, // use primary lease
-            )
-            .await?;
-
-        let selector = Box::new(DefaultWorkerSelector::new(kv_router_config));
+        let cfg = kv_router_config.clone().unwrap_or_default();
+        etcd_client
+            .kv_create(
+                &router_key,
+                serde_json::to_vec_pretty(&cfg)?,
+                None, // use primary lease
+            )
+            .await?;
+
+        let selector = Box::new(DefaultWorkerSelector::new(kv_router_config.clone()));
         let chooser = KvRouter::new(
             component.clone(),
             kv_cache_block_size,
             Some(selector),
             kv_router_config,
             router_uuid.to_string(),
         )
         .await?;

lib/llm/src/discovery/watcher.rs (1)

337-346: Cannot move self.kv_router_config out of &self.

Passing the field by value moves it; clone the Option instead.

Apply:
-                            self.kv_router_config,
+                            self.kv_router_config.clone(),

♻️ Duplicate comments (2)

lib/runtime/src/utils/worker_monitor.rs (1)

92-103: Use a shared constant for the model-card prefix.

The hardcoded "mdc/" literal is inconsistent with other files in this PR (http.rs, grpc.rs, kv_router.rs, local_model.rs) that use model_card::ROOT_PATH. The in-code comments explain that WorkerMonitor is in the wrong crate (should be in dynamo-llm, not dynamo-runtime), preventing the use of ModelDeploymentCard. However, you can still improve consistency by defining a shared constant for the prefix, either in a common location or by exporting ROOT_PATH from a module accessible to both crates.

Consider defining a shared constant for the "mdc/" prefix or exporting model_card::ROOT_PATH from a common location accessible to both dynamo-runtime and dynamo-llm. This ensures consistency across the codebase and simplifies future updates to the prefix.

lib/llm/tests/http_metrics.rs (1)

502-571: Lease-based cleanup note already tracked in prior review

This block assumes lease-based cleanup; earlier feedback flagged the lease isn’t yet attached in LocalModel::attach. Acknowledging it’s slated for the next PR; no action here.

🧹 Nitpick comments (4)

lib/llm/tests/http_metrics.rs (1)
350-357: Keep the etcd watch handle alive in the test watcher

_watcher is dropped immediately; this may stop the stream. Retain it to keep the watch active.
-            let (_prefix, _watcher, receiver) = models_watcher.dissolve();
+            let (_prefix, watcher, receiver) = models_watcher.dissolve();
+            let _watcher_guard = watcher;
lib/llm/src/model_card.rs (2)
46-53: GGUF variant returns a default checksum; consider a more discriminative fallback

Using a default value for GGUF means GGUF-only cards won’t influence mdcsum. At minimum, include stable identity (e.g., file path string) to reduce collisions until a content hash is available.

Example change:
-            ModelInfoType::GGUF(_) => Checksum::default().to_string(),
+            ModelInfoType::GGUF(p) => {
+                // Fallback: include path text to distinguish different GGUFs
+                blake3::hash(p.display().to_string().as_bytes()).to_string()
+            }
Apply similar treatment to TokenizerKind::GGUF and PromptFormatterArtifact::GGUF for consistency.

247-288: Reduce mdcsum log verbosity

The “TEMP” debug log will fire on first call per process and can be noisy. Drop to trace or remove.
-                let hash = blake3::hash(&bytes_to_hash).to_string();
-                tracing::debug!("mdcsum: {hash} of {} bytes", bytes_to_hash.len()); // TEMP
-                hash
+                let hash = blake3::hash(&bytes_to_hash).to_string();
+                tracing::trace!("mdcsum: {hash} of {} bytes", bytes_to_hash.len());
+                hash
lib/llm/src/discovery/model_manager.rs (1)
395-397: Avoid cloning in checksum() return type.

Return a borrowed &str instead of allocating a new String.

Apply:
-pub fn checksum(&self, model: &str) -> Option<String> {
-    self.checksums.get(model).map(|s| s.to_string())
-}
+pub fn checksum(&self, model: &str) -> Option<&str> {
+    self.checksums.get(model).map(|s| s.as_str())
+}
Call sites (e.g., is_valid_checksum) continue to work.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00dcf6f and 163ff3d.

📒 Files selected for processing (21)

lib/bindings/python/rust/http.rs (1 hunks)
lib/llm/src/common/checked_file.rs (2 hunks)
lib/llm/src/discovery.rs (0 hunks)
lib/llm/src/discovery/model_entry.rs (0 hunks)
lib/llm/src/discovery/model_manager.rs (8 hunks)
lib/llm/src/discovery/watcher.rs (16 hunks)
lib/llm/src/entrypoint/input/common.rs (2 hunks)
lib/llm/src/entrypoint/input/grpc.rs (6 hunks)
lib/llm/src/entrypoint/input/http.rs (7 hunks)
lib/llm/src/kv_router.rs (2 hunks)
lib/llm/src/local_model.rs (1 hunks)
lib/llm/src/local_model/network_name.rs (0 hunks)
lib/llm/src/model_card.rs (12 hunks)
lib/llm/src/model_type.rs (1 hunks)
lib/llm/src/preprocessor.rs (1 hunks)
lib/llm/tests/http-service.rs (6 hunks)
lib/llm/tests/http_metrics.rs (10 hunks)
lib/llm/tests/kserve_service.rs (4 hunks)
lib/runtime/src/storage/key_value_store/etcd.rs (2 hunks)
lib/runtime/src/utils/typed_prefix_watcher.rs (1 hunks)
lib/runtime/src/utils/worker_monitor.rs (1 hunks)

💤 Files with no reviewable changes (3)

lib/llm/src/discovery.rs
lib/llm/src/discovery/model_entry.rs
lib/llm/src/local_model/network_name.rs

✅ Files skipped from review due to trivial changes (1)

lib/runtime/src/utils/typed_prefix_watcher.rs

🚧 Files skipped from review as they are similar to previous changes (2)

lib/llm/src/kv_router.rs
lib/llm/src/entrypoint/input/common.rs

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/runtime/src/utils/worker_monitor.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: vllm (arm64)
GitHub Check: vllm (amd64)
GitHub Check: sglang
GitHub Check: Mirror Repository to GitLab
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (20)

lib/llm/src/preprocessor.rs (1)

127-127: LGTM!

The explicit to_string() conversion aligns with the String type of the mdcsum field defined at line 104 and used throughout the struct. The change ensures consistent type handling for the per-model checksum.

lib/llm/src/model_type.rs (1)

77-94: LGTM!

The new units() method correctly decomposes the bitflag into its component ModelType units, mirroring the structure of the existing as_vec() method. This provides a typed alternative for consumers that need to iterate over enabled capabilities.

lib/runtime/src/storage/key_value_store/etcd.rs (2)

152-152: LGTM!

Attaching the primary lease to the PutOptions ensures that newly created keys are tied to the lease lifecycle, enabling automatic cleanup when the process stops. This aligns with the PR objective of ensuring etcd is fully cleaned up when a process terminates.

220-225: LGTM!

The update method now attaches the lease and includes with_prev_key() to enable returning the previous key from the server. This ensures consistent lease association during updates and maintains the lifecycle management introduced in the create method.

lib/llm/src/common/checked_file.rs (2)

36-36: LGTM!

Adding the Copy trait to CryptographicHashMethods is safe and enables implicit bitwise-copy semantics for the enum, which simplifies checksum propagation throughout the codebase.

262-269: LGTM!

The Default implementation provides sensible initialization (empty string for hash, BLAKE3 for algorithm), which supports the per-model checksum handling introduced across the codebase.

lib/llm/src/local_model.rs (1)

421-427: LGTM with noted follow-up.

The code derives a lease-aware key but passes None for the lease parameter in publish. While this means the MDC record is not yet TTL-scoped, grahamking has acknowledged this in a previous review and confirmed that lease attachment will be addressed in a follow-up PR. Given the substantial scope of this PR (ModelEntry → ModelDeploymentCard migration), the incremental approach is reasonable.

lib/llm/src/entrypoint/input/grpc.rs (4)

50-50: LGTM!

The switch from MODEL_ROOT_PATH to model_card::ROOT_PATH aligns with the card-centric approach introduced across the codebase in this PR.

66-124: LGTM!

The code correctly introduces per-model checksum handling in the StaticRemote branch by deriving the checksum via card.mdcsum() and propagating it through both add_chat_completions_model and add_completions_model calls. This aligns with the broader checksum propagation pattern introduced across the codebase.

132-134: LGTM!

The StaticFull branch correctly derives the checksum via model.card().mdcsum() and propagates it through both model registration calls.

144-160: LGTM!

The StaticCore branch correctly derives the checksum via model.card().mdcsum() and propagates it through both model registration calls, maintaining consistency with the other static branches.

lib/bindings/python/rust/http.rs (2)

33-44: LGTM!

The updated add_completions_model method correctly accepts the checksum parameter and propagates it to the model manager. Note that this is a public API change that will require Python callers to provide the checksum when registering models.

46-57: LGTM!

The updated add_chat_completions_model method correctly accepts the checksum parameter and propagates it to the model manager, maintaining consistency with the add_completions_model changes.

lib/llm/src/entrypoint/input/http.rs (1)

78-81: Verify ROOT_PATH for prefix watching (“mdc” vs “mdc/”)

Ensure the etcd keys are written under the same prefix form you’re watching. If keys are stored under “mdc/…”, watching “mdc” could be broader than intended; watching “mdc/” is safer to avoid collisions.

lib/llm/tests/http-service.rs (1)

588-599: LGTM: mdcsum-based registrations are consistent

Creating a card per model and passing card.mdcsum() into add_*_model aligns with the new API.

lib/llm/src/model_card.rs (2)

37-37: Confirm intended ROOT_PATH shape

Constant is mdc (no trailing slash). If keys are stored as mdc/<key>, consider whether ROOT_PATH should be mdc/ for clarity and precise prefix watching.

232-235: LGTM: Public name accessor

name() returning display_name improves API clarity.

lib/llm/src/discovery/model_manager.rs (2)

339-342: Storing per-model checksums alongside engines looks good.

Single structure for engines+checksums reduces inconsistency; removal path clears both.

365-373: add() correctly persists checksum with engine.

Error on duplicates, then insert engine and checksum; OK.

lib/llm/src/discovery/watcher.rs (1)

548-574: Key extractor looks solid; doc and errors are clear.

Docstring matches “mdc/…” and errors include offending key. Tests cover with/without “v1/”.

lib/llm/src/discovery/model_manager.rs

lib/llm/src/discovery/watcher.rs

lib/llm/tests/kserve_service.rs

Signed-off-by: Graham King <[email protected]>

lib/llm/src/discovery/watcher.rs

atchernych

LGTM with a comment, but someone else must approve.

lib/llm/src/discovery/watcher.rs

…ntry (#3350) Signed-off-by: Graham King <[email protected]> Signed-off-by: Piotr Tarasiewicz <[email protected]>

Since #3350 the MDC is attached to an etcd lease, so it cleans up on shutdown. We don't need to manually clear namespace any more. If there are any remaining non-lease keys (I'm not aware of any, and I have looked), we will treat that as a bug. Signed-off-by: Graham King <[email protected]>

…ntry (#3350) Signed-off-by: Graham King <[email protected]>

grahamking requested a review from a team as a code owner October 1, 2025 19:01

pull-request-size bot added the size/L label Oct 1, 2025

github-actions bot added the chore label Oct 1, 2025

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 19:43 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 19:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 19:55 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 19:58 Inactive

pull-request-size bot added size/XL and removed size/L labels Oct 1, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 21:43 Inactive

grahamking force-pushed the gk-per-instance-mdc-4 branch from 5e26291 to 826944f Compare October 1, 2025 21:43

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 21:43 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 1, 2025 21:44 Inactive

atchernych reviewed Oct 2, 2025

View reviewed changes

lib/llm/src/discovery/model_manager.rs Show resolved Hide resolved

atchernych reviewed Oct 2, 2025

View reviewed changes

lib/llm/src/discovery/model_manager.rs Outdated Show resolved Hide resolved

grahamking marked this pull request as draft October 2, 2025 14:59

grahamking force-pushed the gk-per-instance-mdc-4 branch from 826944f to 7449e7b Compare October 3, 2025 22:03

copy-pr-bot bot temporarily deployed to GITLAB October 3, 2025 22:03 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 3, 2025 22:04 Inactive

grahamking added 4 commits October 6, 2025 13:55

Thanks Code Rabbit

6f9e471

Signed-off-by: Graham King <[email protected]>

fix: Attach the lease to key_value_store objects

007baec

This cleans up the ModelDeploymentCard on shutdown Signed-off-by: Graham King <[email protected]>

grahamking force-pushed the gk-per-instance-mdc-4 branch from 7449e7b to 163ff3d Compare October 6, 2025 17:55

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 17:55 Inactive

grahamking marked this pull request as ready for review October 6, 2025 17:57

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 18:00 Inactive

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

Thanks Code Rabbit

b8ff254

Signed-off-by: Graham King <[email protected]>

grahamking requested review from a team as code owners October 6, 2025 19:34

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:34 Inactive

format

5d8e287

Signed-off-by: Graham King <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 6, 2025 19:49 Inactive

grahamking requested a review from atchernych October 6, 2025 20:29

grahamking mentioned this pull request Oct 6, 2025

[FEATURE]: Don't overwrite model deployment card, add checksum #1375

Closed

atchernych reviewed Oct 6, 2025

View reviewed changes

lib/llm/src/discovery/watcher.rs Show resolved Hide resolved

nnshah1 approved these changes Oct 6, 2025

View reviewed changes

atchernych approved these changes Oct 6, 2025

View reviewed changes

lib/llm/src/discovery/watcher.rs Show resolved Hide resolved

paulhendricks approved these changes Oct 7, 2025

View reviewed changes

grahamking merged commit 81162df into main Oct 7, 2025
27 of 28 checks passed

grahamking deleted the gk-per-instance-mdc-4 branch October 7, 2025 14:02

ptarasiewiczNV pushed a commit that referenced this pull request Oct 8, 2025

chore(discovery): Watch/publish ModelDeploymentCard instead of ModelE…

1edeb1d

…ntry (#3350) Signed-off-by: Graham King <[email protected]> Signed-off-by: Piotr Tarasiewicz <[email protected]>

grahamking mentioned this pull request Oct 9, 2025

chore: Remove clear_namespace script #3519

Merged

nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025

chore(discovery): Watch/publish ModelDeploymentCard instead of ModelE…

2522cab

…ntry (#3350) Signed-off-by: Graham King <[email protected]>

Uh oh!

chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry #3350

chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry #3350

Uh oh!

Conversation

grahamking commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking commented Oct 6, 2025

Uh oh!

coderabbitai bot commented Oct 6, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atchernych left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

grahamking commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 1, 2025 •

edited

Loading