Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Oct 1, 2025

Those two always overlapped, now there's only one. Allows us to remove ModelEntry and ModelNetworkName.

We also attach the lease in etcd's KeyValueStore impl, and now the etcd is fully cleaned up when a process stops.

Summary by CodeRabbit

  • New Features

    • Model registrations now include a model-card checksum for stronger integrity and validation.
    • Persistent checksum caching and a public model name accessor for deployment cards.
  • Refactor

    • Discovery, watchers and routers migrated to a deployment-card-centric flow with a unified root prefix.
    • Model publication streamlined to lease-bound key publishing.
  • Tests

    • Tests updated to use deployment cards and checksum-aware registration.
  • Chores

    • Removed deprecated public types/constants and cleaned up imports.

@grahamking grahamking requested a review from a team as a code owner October 1, 2025 19:01
@github-actions github-actions bot added the chore label Oct 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 1, 2025

Walkthrough

ModelEntry and MODEL_ROOT_PATH were removed. Discovery/watcher switched to ModelDeploymentCard and derives EndpointId from etcd keys. ModelDeploymentCard gained name() and checksum caching; move_from_nats became public. ModelManager and model registration APIs now carry a per-card checksum. Entrypoints, routers, tests, and runtime utilities use model_card::ROOT_PATH ("mdc/").

Changes

Cohort / File(s) Summary of Changes
Discovery API cleanup
lib/llm/src/discovery.rs, lib/llm/src/discovery/model_entry.rs
Removed model_entry module, ModelEntry type, and MODEL_ROOT_PATH public constant.
Watcher → MDC-centric
lib/llm/src/discovery/watcher.rs
Replaced ModelEntry usage with ModelDeploymentCard; extract EndpointId from etcd keys (etcd_key_extract); renamed/added methods (all_cards, cards_for_model), updated handle_put signature and registration flows.
Model card core
lib/llm/src/model_card.rs
Added name() and cached MDC checksum (OnceLock<String> + mdcsum()), made move_from_nats public and async (returns Result<()>), initialized checksum in constructors.
Model manager / checksums
lib/llm/src/discovery/model_manager.rs
Added checksum tracking to ModelEngines; save_model_card accepts ModelDeploymentCard; added is_valid_checksum; engine add/remove methods now accept and store per-model checksum; public add_*_model signatures updated.
Entrypoints (watcher & registrations)
lib/llm/src/entrypoint/input/common.rs, lib/llm/src/entrypoint/input/grpc.rs, lib/llm/src/entrypoint/input/http.rs
Switched watcher prefix from MODEL_ROOT_PATH to model_card::ROOT_PATH; threaded card.mdcsum() into model registration calls (add_*_model) across static/local paths; imports updated to model_card::{self, ModelDeploymentCard}.
KV router & typed watcher docs
lib/llm/src/kv_router.rs, lib/runtime/src/utils/typed_prefix_watcher.rs
Watch prefix changed to model_card::ROOT_PATH/"mdc/"; typed watcher example and extraction closure now reference ModelDeploymentCard.
Local model & network name removal
lib/llm/src/local_model.rs, lib/llm/src/local_model/network_name.rs
Removed ModelNetworkName and its export; simplified local publish to use lease-based endpoint path and publish ModelDeploymentCard only (no ModelEntry).
Worker monitor runtime change
lib/runtime/src/utils/worker_monitor.rs
Removed MODEL_ROOT_PATH and ModelEntry deserialization; switched to "mdc/" literal and parsing via serde_json::Value to extract runtime_kv_blocks.
Storage: etcd lease-aware puts
lib/runtime/src/storage/key_value_store/etcd.rs
create and update PutOptions now attach primary lease (with_lease(...)) and update uses with_prev_key() to return previous key.
Tests & bindings updated
lib/llm/tests/*, lib/bindings/python/rust/http.rs
Tests updated to create/use ModelDeploymentCard and pass card.mdcsum() to add_*_model calls; Python bindings for HttpService now require checksum parameter in add_completions_model and add_chat_completions_model.
Misc. internals / small changes
lib/llm/src/model_type.rs, lib/llm/src/preprocessor.rs, lib/llm/src/common/checked_file.rs, lib/llm/src/discovery/watcher.rs (imports/tests)
Added ModelType::units(); explicit mdcsum().to_string() ownership in preprocessor; Checksum default and enum Copy derive; various imports adjusted to ModelDeploymentCard/EndpointId and updated tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Etcd as Etcd KV ("mdc/*")
  participant Watcher as ModelWatcher
  participant Card as ModelDeploymentCard
  participant NATS as NATS
  participant Manager as ModelManager / Engines

  Etcd->>Watcher: PUT(key="mdc/{endpoint}/...", value=card_json)
  Note right of Watcher: Extract EndpointId from key string
  Watcher->>Watcher: etcd_key_extract(key) --> (EndpointId, name)
  Watcher->>Card: Deserialize JSON -> ModelDeploymentCard
  Watcher->>Card: await card.move_from_nats(nats_client)  -- fetch artifacts (async)
  Card-->>Watcher: populate cache_dir, checksum cached
  Watcher->>Manager: save_model_card(key, card.clone())
  Manager->>Manager: store checksum and wire engines using card.name()
  Manager-->>Watcher: registration ack
  Watcher-->>Etcd: complete handling
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hop through "mdc/" lanes tonight,
Cards in my paws, checksum held tight.
Keys give endpoints, NATS brings the store,
Engines awaken, wires hum once more.
A rabbit’s nose twitches: new names take flight. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description does not follow the repository’s required template and is missing all mandatory sections, including Overview, Details, Where should the reviewer start, and Related Issues. Without these headings, reviewers cannot easily identify the purpose, scope, or starting point for the changes, nor see any linked issues. Please update the description using the provided template by adding an Overview summarizing the change, a Details section explaining the modifications, a “Where should the reviewer start” section to highlight key files, and a Related Issues section with linked issue numbers.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.13% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the primary change in the discovery module by indicating that ModelDeploymentCard is now watched and published instead of ModelEntry. It uses the conventional “chore(discovery)” prefix and accurately reflects the core update without extraneous detail. The phrasing is specific enough for teammates to understand the main purpose at a glance.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/llm/src/model_card.rs (1)

298-355: Compile-time bug: wrong argument type to TempDir::with_prefix.

with_prefix expects &str; passing String won’t compile. Also prefer using the Slug’s AsRef<str> impl.

Apply:

-        let bucket_name = self.slug();
-        let target_dir = tempfile::TempDir::with_prefix(bucket_name.to_string())?;
+        let bucket_name = self.slug();
+        let target_dir = tempfile::TempDir::with_prefix(bucket_name.as_ref())?;

The rest of the function looks sound; keeping the Arc<TempDir> in cache_dir ensures the directory’s lifetime. Consider a tiny accessor (e.g., cache_path()) later if callers need the path.

lib/llm/src/discovery/watcher.rs (2)

107-120: Log strings still say “model entry”.

We now deserialize a ModelDeploymentCard; update logs to avoid confusion.

- tracing::error!(%err, value, "Invalid JSON in model entry")
+ tracing::error!(%err, value, "Invalid JSON in model card")

And similarly for the UTF‑8 branch.


444-469: Bug: all_cards likely misses versioned keys.

Cards are written under keys like v1/mdc/...; querying prefix "mdc" won’t match those. This can cause false “last instance” detections and premature engine removal.

Apply:

-        let kvs = etcd_client.kv_get_prefix(model_card::ROOT_PATH).await?;
-        let mut cards = Vec::with_capacity(kvs.len());
+        let v1_prefix = format!("v1/{}/", model_card::ROOT_PATH);
+        let raw_prefix = format!("{}/", model_card::ROOT_PATH);
+        let mut kvs = etcd_client.kv_get_prefix(&v1_prefix).await?;
+        // Also accept unversioned keys for compatibility.
+        kvs.extend(etcd_client.kv_get_prefix(&raw_prefix).await?);
+        let mut cards = Vec::with_capacity(kvs.len());

Also update log wording in this block from “model entry” to “model card”.

🧹 Nitpick comments (5)
lib/runtime/src/utils/worker_monitor.rs (1)

96-100: Consider using ModelDeploymentCard for type safety.

The code uses raw JSON parsing (serde_json::Value) instead of deserializing to ModelDeploymentCard like other files in this PR (kv_router.rs line 252). This loses compile-time type safety for the runtime_config field access.

Is there a specific reason to avoid using ModelDeploymentCard here? If not, consider using the typed approach for better maintainability:

-            |card: serde_json::Value| {
-                card.get("runtime_config")
-                    .and_then(|rc| rc.get("total_kv_blocks"))
-                    .and_then(|t_kv| t_kv.as_u64())
+            |card: ModelDeploymentCard| {
+                card.runtime_config.total_kv_blocks
             },

If ModelDeploymentCard is not accessible from the runtime crate or if there are other constraints preventing this approach, please clarify so we can document the rationale.

lib/llm/src/model_card.rs (1)

414-421: Behavior change: load_from_store no longer localizes artifacts.

Previously this populated cache_dir by pulling from NATS; now it returns an in‑store card. Ensure every callsite invokes card.move_from_nats(...) before using tokenizer/config files (the watcher does).

If any non-watcher path still calls load_from_store, I can scan and patch those to fetch from NATS. Want a follow-up PR?

lib/llm/src/discovery/watcher.rs (3)

195-199: Potential namespace mixing on delete.

cards_for_model(model_name) counts across all namespaces; engines are keyed by name() only. Verify intended semantics: should removal be gated by instances in the same namespace, or truly global?

If you want namespace scoping, parse the namespace from key and filter:

- let active_instances = self.cards_for_model(&model_name).await?;
+ let ns = etcd_key_to_endpoint_id(key)?.namespace;
+ let active_instances = self.cards_for_model_in_namespace(&model_name, &ns).await?;

I can draft cards_for_model_in_namespace if desired.


471-478: Optional: filter by namespace here if delete semantics are per-namespace.

If you adopt namespace scoping, add a variant that filters on parsed EndpointId.namespace.

I can implement cards_for_model_in_namespace() and its call-site changes if you confirm desired behavior.


514-538: Tests cover both versioned/unversioned prefixes.

Consider adding a case without an instance-id segment (exact 4 parts) to assert acceptance, and a negative case with wrong root. I can add these if wanted.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f63a05 and 00dcf6f.

📒 Files selected for processing (13)
  • lib/llm/src/discovery.rs (0 hunks)
  • lib/llm/src/discovery/model_entry.rs (0 hunks)
  • lib/llm/src/discovery/watcher.rs (19 hunks)
  • lib/llm/src/entrypoint/input/common.rs (2 hunks)
  • lib/llm/src/entrypoint/input/grpc.rs (2 hunks)
  • lib/llm/src/entrypoint/input/http.rs (2 hunks)
  • lib/llm/src/kv_router.rs (2 hunks)
  • lib/llm/src/local_model.rs (1 hunks)
  • lib/llm/src/local_model/network_name.rs (0 hunks)
  • lib/llm/src/model_card.rs (4 hunks)
  • lib/llm/tests/http_metrics.rs (4 hunks)
  • lib/runtime/src/utils/typed_prefix_watcher.rs (1 hunks)
  • lib/runtime/src/utils/worker_monitor.rs (1 hunks)
💤 Files with no reviewable changes (3)
  • lib/llm/src/local_model/network_name.rs
  • lib/llm/src/discovery/model_entry.rs
  • lib/llm/src/discovery.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/runtime/src/utils/worker_monitor.rs
🧬 Code graph analysis (4)
lib/runtime/src/utils/worker_monitor.rs (3)
lib/runtime/src/utils/typed_prefix_watcher.rs (1)
  • lease_id (211-213)
lib/llm/src/local_model.rs (1)
  • card (331-333)
lib/llm/src/discovery/watcher.rs (2)
  • serde_json (107-107)
  • serde_json (452-452)
lib/llm/src/entrypoint/input/common.rs (2)
lib/runtime/src/transports/etcd.rs (1)
  • etcd_client (131-133)
lib/runtime/src/distributed.rs (1)
  • etcd_client (269-271)
lib/llm/src/kv_router.rs (3)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • ModelDeploymentCard (445-450)
  • lease_id (138-142)
lib/runtime/src/utils/typed_prefix_watcher.rs (1)
  • lease_id (211-213)
lib/llm/src/local_model.rs (1)
  • card (331-333)
lib/llm/src/local_model.rs (3)
lib/runtime/src/transports/etcd.rs (1)
  • lease_id (136-138)
lib/runtime/src/component.rs (1)
  • endpoint (236-243)
lib/runtime/src/storage/key_value_store.rs (1)
  • from_raw (36-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (12)
lib/llm/src/entrypoint/input/http.rs (1)

7-7: LGTM! Clean migration to model_card::ROOT_PATH.

The changes correctly update the import and reference the etcd root path from the new model_card module instead of the removed discovery::MODEL_ROOT_PATH. The logic remains unchanged.

Also applies to: 13-13, 78-78

lib/llm/src/entrypoint/input/grpc.rs (1)

7-7: LGTM! Consistent migration to model_card::ROOT_PATH.

The changes mirror those in http.rs, correctly updating the import and using the new path constant from the model_card module.

Also applies to: 12-12, 50-50

lib/runtime/src/utils/typed_prefix_watcher.rs (1)

70-76: LGTM! Documentation updated to reflect ModelDeploymentCard migration.

The example code correctly demonstrates the new API using ModelDeploymentCard instead of ModelEntry and the updated etcd prefix "mdc/".

lib/llm/src/kv_router.rs (1)

47-47: LGTM! Clean migration to ModelDeploymentCard.

The changes correctly:

  • Import ModelDeploymentCard from the model_card module
  • Update the etcd prefix to model_card::ROOT_PATH
  • Update the value extractor closure to work with ModelDeploymentCard instead of ModelEntry

The extraction logic remains functionally identical, accessing the runtime_config field.

Also applies to: 250-253

lib/llm/src/model_card.rs (1)

192-196: Accessor name() is a clean addition.

Simple, correct alias to display_name; helps callsites drop field coupling.

lib/llm/src/discovery/watcher.rs (7)

16-16: Import changes look correct.

EndpointId and Annotated are the right additions for the refactor.


24-24: MDC-centric import is correct.

Switch to model_card::{self, ModelDeploymentCard} is aligned with the PR goal.


40-40: ModelManager path change is fine.


150-168: PUT handling flow looks correct.

Key → EndpointId → optional namespace filter → handle_put path is clean.


259-266: Good: localize artifacts before wiring pipelines.

Calling card.move_from_nats(self.drt.nats_client()).await? up front prevents downstream file I/O surprises.


274-282: Early exit on additional endpoints is sound.

Avoids rebuilding engines when a new instance appears for an existing model.


396-397: API assumption: Backend::from_mdc(card) takes &ModelDeploymentCard.

You’re passing &mut ModelDeploymentCard; coercion to &T is fine if the signature is &ModelDeploymentCard. Please confirm the signature.

If it consumes by value, clone: Backend::from_mdc(&card.clone()).

@grahamking grahamking marked this pull request as draft October 2, 2025 14:59
@grahamking grahamking force-pushed the gk-per-instance-mdc-4 branch from 826944f to 7449e7b Compare October 3, 2025 22:03
…ntry

Those two always overlapped, now there's only one. Allows us to remove
ModelEntry and ModelNetworkName.

The next and hopefully final step will be to attach the card to an etcd
lease for auto-cleanup.

Signed-off-by: Graham King <[email protected]>
Signed-off-by: Graham King <[email protected]>
This cleans up the ModelDeploymentCard on shutdown

Signed-off-by: Graham King <[email protected]>
If we are already serving a model and a new worker arrives with a
different model card, claiming to be for the same model, we now don't
know which is the correct model. Log an error.

Ideally we would stop using that instance, but that happens in a
different crate. It's not clear how we can communicate with it, see
TODO.

Signed-off-by: Graham King <[email protected]>
@grahamking
Copy link
Contributor Author

@coderabbitai review please

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 6, 2025

@grahamking: I'll review the changes in this PR for you.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@grahamking grahamking marked this pull request as ready for review October 6, 2025 17:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
lib/llm/src/entrypoint/input/http.rs (1)

244-246: Keep the etcd watch handle alive; dropping it may cancel the watch

dissolve() returns a watch handle; binding it to _watcher and dropping it can stop events. Keep a guard in scope to ensure the watch remains active.

Apply this diff:

-    let (_prefix, _watcher, receiver) = models_watcher.dissolve();
+    let (_prefix, watcher, receiver) = models_watcher.dissolve();
+    // Keep the watch handle alive for the lifetime of this task
+    let _watcher_guard = watcher;
lib/llm/src/discovery/model_manager.rs (2)

109-112: Bug: has_model_any ignores embeddings/tensor models.

This causes handle_put to rebuild/register existing embeddings/tensor models, leading to ModelAlreadyExists errors and noisy logs.

Apply:

 pub fn has_model_any(&self, model: &str) -> bool {
     self.chat_completion_engines.read().contains(model)
-        || self.completion_engines.read().contains(model)
+        || self.completion_engines.read().contains(model)
+        || self.embeddings_engines.read().contains(model)
+        || self.tensor_engines.read().contains(model)
 }

289-304: Move/borrow errors with kv_router_config; used after move and moved twice.

  • unwrap_or_default() moves the Option; it’s used again later.
  • The Option is also passed into two constructors; needs cloning.

Apply:

-        etcd_client
-            .kv_create(
-                &router_key,
-                serde_json::to_vec_pretty(&kv_router_config.unwrap_or_default())?,
-                None, // use primary lease
-            )
-            .await?;
-
-        let selector = Box::new(DefaultWorkerSelector::new(kv_router_config));
+        let cfg = kv_router_config.clone().unwrap_or_default();
+        etcd_client
+            .kv_create(
+                &router_key,
+                serde_json::to_vec_pretty(&cfg)?,
+                None, // use primary lease
+            )
+            .await?;
+
+        let selector = Box::new(DefaultWorkerSelector::new(kv_router_config.clone()));
         let chooser = KvRouter::new(
             component.clone(),
             kv_cache_block_size,
             Some(selector),
             kv_router_config,
             router_uuid.to_string(),
         )
         .await?;
lib/llm/src/discovery/watcher.rs (1)

337-346: Cannot move self.kv_router_config out of &self.

Passing the field by value moves it; clone the Option instead.

Apply:

-                            self.kv_router_config,
+                            self.kv_router_config.clone(),
♻️ Duplicate comments (2)
lib/runtime/src/utils/worker_monitor.rs (1)

92-103: Use a shared constant for the model-card prefix.

The hardcoded "mdc/" literal is inconsistent with other files in this PR (http.rs, grpc.rs, kv_router.rs, local_model.rs) that use model_card::ROOT_PATH. The in-code comments explain that WorkerMonitor is in the wrong crate (should be in dynamo-llm, not dynamo-runtime), preventing the use of ModelDeploymentCard. However, you can still improve consistency by defining a shared constant for the prefix, either in a common location or by exporting ROOT_PATH from a module accessible to both crates.

Consider defining a shared constant for the "mdc/" prefix or exporting model_card::ROOT_PATH from a common location accessible to both dynamo-runtime and dynamo-llm. This ensures consistency across the codebase and simplifies future updates to the prefix.

lib/llm/tests/http_metrics.rs (1)

502-571: Lease-based cleanup note already tracked in prior review

This block assumes lease-based cleanup; earlier feedback flagged the lease isn’t yet attached in LocalModel::attach. Acknowledging it’s slated for the next PR; no action here.

🧹 Nitpick comments (4)
lib/llm/tests/http_metrics.rs (1)

350-357: Keep the etcd watch handle alive in the test watcher

_watcher is dropped immediately; this may stop the stream. Retain it to keep the watch active.

-            let (_prefix, _watcher, receiver) = models_watcher.dissolve();
+            let (_prefix, watcher, receiver) = models_watcher.dissolve();
+            let _watcher_guard = watcher;
lib/llm/src/model_card.rs (2)

46-53: GGUF variant returns a default checksum; consider a more discriminative fallback

Using a default value for GGUF means GGUF-only cards won’t influence mdcsum. At minimum, include stable identity (e.g., file path string) to reduce collisions until a content hash is available.

Example change:

-            ModelInfoType::GGUF(_) => Checksum::default().to_string(),
+            ModelInfoType::GGUF(p) => {
+                // Fallback: include path text to distinguish different GGUFs
+                blake3::hash(p.display().to_string().as_bytes()).to_string()
+            }

Apply similar treatment to TokenizerKind::GGUF and PromptFormatterArtifact::GGUF for consistency.


247-288: Reduce mdcsum log verbosity

The “TEMP” debug log will fire on first call per process and can be noisy. Drop to trace or remove.

-                let hash = blake3::hash(&bytes_to_hash).to_string();
-                tracing::debug!("mdcsum: {hash} of {} bytes", bytes_to_hash.len()); // TEMP
-                hash
+                let hash = blake3::hash(&bytes_to_hash).to_string();
+                tracing::trace!("mdcsum: {hash} of {} bytes", bytes_to_hash.len());
+                hash
lib/llm/src/discovery/model_manager.rs (1)

395-397: Avoid cloning in checksum() return type.

Return a borrowed &str instead of allocating a new String.

Apply:

-pub fn checksum(&self, model: &str) -> Option<String> {
-    self.checksums.get(model).map(|s| s.to_string())
-}
+pub fn checksum(&self, model: &str) -> Option<&str> {
+    self.checksums.get(model).map(|s| s.as_str())
+}

Call sites (e.g., is_valid_checksum) continue to work.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00dcf6f and 163ff3d.

📒 Files selected for processing (21)
  • lib/bindings/python/rust/http.rs (1 hunks)
  • lib/llm/src/common/checked_file.rs (2 hunks)
  • lib/llm/src/discovery.rs (0 hunks)
  • lib/llm/src/discovery/model_entry.rs (0 hunks)
  • lib/llm/src/discovery/model_manager.rs (8 hunks)
  • lib/llm/src/discovery/watcher.rs (16 hunks)
  • lib/llm/src/entrypoint/input/common.rs (2 hunks)
  • lib/llm/src/entrypoint/input/grpc.rs (6 hunks)
  • lib/llm/src/entrypoint/input/http.rs (7 hunks)
  • lib/llm/src/kv_router.rs (2 hunks)
  • lib/llm/src/local_model.rs (1 hunks)
  • lib/llm/src/local_model/network_name.rs (0 hunks)
  • lib/llm/src/model_card.rs (12 hunks)
  • lib/llm/src/model_type.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (1 hunks)
  • lib/llm/tests/http-service.rs (6 hunks)
  • lib/llm/tests/http_metrics.rs (10 hunks)
  • lib/llm/tests/kserve_service.rs (4 hunks)
  • lib/runtime/src/storage/key_value_store/etcd.rs (2 hunks)
  • lib/runtime/src/utils/typed_prefix_watcher.rs (1 hunks)
  • lib/runtime/src/utils/worker_monitor.rs (1 hunks)
💤 Files with no reviewable changes (3)
  • lib/llm/src/discovery.rs
  • lib/llm/src/discovery/model_entry.rs
  • lib/llm/src/local_model/network_name.rs
✅ Files skipped from review due to trivial changes (1)
  • lib/runtime/src/utils/typed_prefix_watcher.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/llm/src/kv_router.rs
  • lib/llm/src/entrypoint/input/common.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/runtime/src/utils/worker_monitor.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (20)
lib/llm/src/preprocessor.rs (1)

127-127: LGTM!

The explicit to_string() conversion aligns with the String type of the mdcsum field defined at line 104 and used throughout the struct. The change ensures consistent type handling for the per-model checksum.

lib/llm/src/model_type.rs (1)

77-94: LGTM!

The new units() method correctly decomposes the bitflag into its component ModelType units, mirroring the structure of the existing as_vec() method. This provides a typed alternative for consumers that need to iterate over enabled capabilities.

lib/runtime/src/storage/key_value_store/etcd.rs (2)

152-152: LGTM!

Attaching the primary lease to the PutOptions ensures that newly created keys are tied to the lease lifecycle, enabling automatic cleanup when the process stops. This aligns with the PR objective of ensuring etcd is fully cleaned up when a process terminates.


220-225: LGTM!

The update method now attaches the lease and includes with_prev_key() to enable returning the previous key from the server. This ensures consistent lease association during updates and maintains the lifecycle management introduced in the create method.

lib/llm/src/common/checked_file.rs (2)

36-36: LGTM!

Adding the Copy trait to CryptographicHashMethods is safe and enables implicit bitwise-copy semantics for the enum, which simplifies checksum propagation throughout the codebase.


262-269: LGTM!

The Default implementation provides sensible initialization (empty string for hash, BLAKE3 for algorithm), which supports the per-model checksum handling introduced across the codebase.

lib/llm/src/local_model.rs (1)

421-427: LGTM with noted follow-up.

The code derives a lease-aware key but passes None for the lease parameter in publish. While this means the MDC record is not yet TTL-scoped, grahamking has acknowledged this in a previous review and confirmed that lease attachment will be addressed in a follow-up PR. Given the substantial scope of this PR (ModelEntry → ModelDeploymentCard migration), the incremental approach is reasonable.

lib/llm/src/entrypoint/input/grpc.rs (4)

50-50: LGTM!

The switch from MODEL_ROOT_PATH to model_card::ROOT_PATH aligns with the card-centric approach introduced across the codebase in this PR.


66-124: LGTM!

The code correctly introduces per-model checksum handling in the StaticRemote branch by deriving the checksum via card.mdcsum() and propagating it through both add_chat_completions_model and add_completions_model calls. This aligns with the broader checksum propagation pattern introduced across the codebase.


132-134: LGTM!

The StaticFull branch correctly derives the checksum via model.card().mdcsum() and propagates it through both model registration calls.


144-160: LGTM!

The StaticCore branch correctly derives the checksum via model.card().mdcsum() and propagates it through both model registration calls, maintaining consistency with the other static branches.

lib/bindings/python/rust/http.rs (2)

33-44: LGTM!

The updated add_completions_model method correctly accepts the checksum parameter and propagates it to the model manager. Note that this is a public API change that will require Python callers to provide the checksum when registering models.


46-57: LGTM!

The updated add_chat_completions_model method correctly accepts the checksum parameter and propagates it to the model manager, maintaining consistency with the add_completions_model changes.

lib/llm/src/entrypoint/input/http.rs (1)

78-81: Verify ROOT_PATH for prefix watching (“mdc” vs “mdc/”)

Ensure the etcd keys are written under the same prefix form you’re watching. If keys are stored under “mdc/…”, watching “mdc” could be broader than intended; watching “mdc/” is safer to avoid collisions.

lib/llm/tests/http-service.rs (1)

588-599: LGTM: mdcsum-based registrations are consistent

Creating a card per model and passing card.mdcsum() into add_*_model aligns with the new API.

lib/llm/src/model_card.rs (2)

37-37: Confirm intended ROOT_PATH shape

Constant is mdc (no trailing slash). If keys are stored as mdc/<key>, consider whether ROOT_PATH should be mdc/ for clarity and precise prefix watching.


232-235: LGTM: Public name accessor

name() returning display_name improves API clarity.

lib/llm/src/discovery/model_manager.rs (2)

339-342: Storing per-model checksums alongside engines looks good.

Single structure for engines+checksums reduces inconsistency; removal path clears both.


365-373: add() correctly persists checksum with engine.

Error on duplicates, then insert engine and checksum; OK.

lib/llm/src/discovery/watcher.rs (1)

548-574: Key extractor looks solid; doc and errors are clear.

Docstring matches “mdc/…” and errors include offending key. Tests cover with/without “v1/”.

Signed-off-by: Graham King <[email protected]>
Signed-off-by: Graham King <[email protected]>
Copy link
Contributor

@atchernych atchernych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a comment, but someone else must approve.

@grahamking grahamking merged commit 81162df into main Oct 7, 2025
27 of 28 checks passed
@grahamking grahamking deleted the gk-per-instance-mdc-4 branch October 7, 2025 14:02
ptarasiewiczNV pushed a commit that referenced this pull request Oct 8, 2025
…ntry (#3350)

Signed-off-by: Graham King <[email protected]>
Signed-off-by: Piotr Tarasiewicz <[email protected]>
grahamking added a commit that referenced this pull request Oct 9, 2025
Since #3350 the MDC is attached
to an etcd lease, so it cleans up on shutdown. We don't need to manually
clear namespace any more.

If there are any remaining non-lease keys (I'm not aware of any, and I
have looked), we will treat that as a bug.

Signed-off-by: Graham King <[email protected]>
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants