-
Notifications
You must be signed in to change notification settings - Fork 690
feat: Allow an endpoint to serve multiple models #2418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughImports narrowed; LocalModel.card made public; uniqueness checks removed from LocalModel::attach and LocalModelBuilder::build; ModelNetworkName creation switched to ModelNetworkName::new() with Default implemented; previous etcd-derived constructors and loaders removed; Display unchanged. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as LocalModelBuilder
participant Name as ModelNetworkName
participant Etcd as etcd::Client
participant Model as LocalModel
Builder->>Name: new()
Name-->>Builder: ModelNetworkName
Builder->>Etcd: register(model_name)
Builder-->>Model: build()
Model->>Etcd: attach(model_name)
Note over Builder,Model: No per-component uniqueness check
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/llm/src/local_model/network_name.rs (1)
10-12: UUID-based network name aligns with multi-model endpoints; consider ergonomic trait impls to avoid to_string clonesThe change meets the PR goal. To reduce allocations at call sites (e.g., kv_create), consider implementing AsRef and/or Deref<Target = str> so callers can pass &network_name without allocating a new String.
Add these impls outside the shown range:
impl AsRef<str> for ModelNetworkName { fn as_ref(&self) -> &str { &self.0 } } impl std::ops::Deref for ModelNetworkName { type Target = str; fn deref(&self) -> &Self::Target { &self.0 } }lib/llm/src/local_model.rs (1)
239-239: Avoid making LocalModel.card public; prefer accessor/mutator to preserve invariantsMaking card public widens the API and bypasses invariants (e.g., callers can mutate after registration). You already expose card() for read access. If mutation is needed, consider pub(crate) or a dedicated mutable accessor.
Apply within this range:
- pub card: ModelDeploymentCard, // TEMP pub + card: ModelDeploymentCard,Then add a constrained mutator elsewhere:
impl LocalModel { pub(crate) fn card_mut(&mut self) -> &mut ModelDeploymentCard { &mut self.card } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/llm/src/local_model.rs(3 hunks)lib/llm/src/local_model/network_name.rs(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/llm/src/local_model.rs (2)
lib/runtime/src/component.rs (3)
component(347-349)component(490-496)new(481-487)lib/llm/src/local_model/network_name.rs (1)
new(10-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
lib/llm/src/local_model/network_name.rs (2)
15-19: Default delegating to new() is appropriateUsing Default -> new() keeps construction consistent and simple. No issues.
4-4: Confirmed MODEL_ROOT_PATH has no trailing slash
MODEL_ROOT_PATH is defined as"models"in lib/llm/src/discovery.rs (line 14), so concatenations won’t produce double slashes.lib/llm/src/local_model.rs (2)
13-14: Import narrowing LGTMDropping Component and importing only Endpoint fits the current usage and reduces surface area.
320-320: Confirm lease semantics and key-shape assumptions; verify card publishing collision risk
kv_create(..., None)unwraps toself.primary_lease()(the client’s primary lease), so keys remain ephemeral as before.- No remaining uses of
ModelNetworkName::from_*or other slug-based constructors, and all consumers callkv_get_prefix(MODEL_ROOT_PATH)and parseModelEntryfrom the value—no code assumes a slug-derived key suffix.- Could not locate any
card_store.publish(..., None, slug, …)call sites in the repo; please manually confirm that writing cards under a slug-only key for multi-model endpoints is intended. If not, include the variant ID in the key (e.g."{slug}:{variant_id}") or publish variant metadata alongside each card.
For LoRA we want the worker to be able to call `register_llm` multiple times, adding models to it's own endpoint. That means allowing an endpoint to serve multiple models. Changes: - Register in etcd using a UUID rather than the slugified endpoint name. - Remove the check that prevents multiples models on an endpoint. This reverts #1103 For #2267 .
3029c5f to
1ff5096
Compare
Signed-off-by: Hannah Zhang <[email protected]>
For LoRA we want the worker to be able to call
register_llmmultiple times, adding models to it's own endpoint. That means allowing an endpoint to serve multiple models.Changes:
For #2267 .
Summary by CodeRabbit
New Features
Refactor