-
Notifications
You must be signed in to change notification settings - Fork 852
feat: Dynamic Endpoint Exposure Based on Model Type #1447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dynamic Endpoint Exposure Based on Model Type #1447
Conversation
|
👋 Hi abrarshivani! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe changes implement dynamic HTTP endpoint control based on model lifecycle events. Model addition or removal triggers asynchronous notifications, which update the availability of chat, completion, and embedding endpoints in the HTTP service at runtime. This is achieved through new runtime flags and middleware, replacing previous static configuration and route inclusion. Changes
Sequence Diagram(s)sequenceDiagram
participant ModelWatcher
participant HttpService
participant Client
ModelWatcher->>HttpService: Send ModelUpdate (Added/Removed)
HttpService->>HttpService: update_http_endpoints()
Client->>HttpService: HTTP request to endpoint
HttpService->>HttpService: Check endpoint flag (middleware)
alt Endpoint enabled
HttpService->>Client: Process request normally
else Endpoint disabled
HttpService->>Client: Return 503/404 error
end
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
lib/llm/src/http/service/service_v2.rs (1)
307-311:⚠️ Potential issueBuild-breaking bug:
with_request_templatenestsOption
request_templatealready has typeOption<RequestTemplate>, but the setter wraps it inSome(_), yieldingOption<Option<_>>and causing a type mismatch.-pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self { - self.request_template = Some(request_template); +pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self { + self.request_template = request_template; self }
🧹 Nitpick comments (7)
lib/llm/src/discovery/watcher.rs (3)
35-39: Add documentation for the publicModelUpdateenum.This enum is part of the public API and should be documented to help users understand its purpose and usage.
Apply this diff to add documentation:
+/// Represents a model lifecycle event for dynamic endpoint management. +/// +/// This enum is used to notify external components about model additions +/// and removals, enabling dynamic control of HTTP endpoints based on +/// available model types. #[derive(Debug, Clone, Copy, PartialEq)] pub enum ModelUpdate { + /// A model of the specified type has been added to the system. Added(ModelType), + /// A model of the specified type has been removed from the system. Removed(ModelType), }
67-69: Add documentation for the public method.This method is part of the public API and should be documented.
Apply this diff to add documentation:
+ /// Sets the channel sender for model update notifications. + /// + /// When set, the watcher will send `ModelUpdate` events through this channel + /// whenever models are added or removed from the system. pub fn set_notify_on_model_update(&mut self, tx: Sender<ModelUpdate>) { self.model_update_tx = Some(tx); }
120-124: Consider logging when channel send fails.The
.ok()silently ignores send errors, which could occur if the receiver is dropped. Consider logging these failures for better observability.Apply this diff to log send failures:
if let Some(tx) = &self.model_update_tx { - tx.send(ModelUpdate::Added(model_entry.model_type)) - .await - .ok(); + if let Err(e) = tx.send(ModelUpdate::Added(model_entry.model_type)).await { + tracing::warn!("Failed to send model update notification: {}", e); + } }launch/dynamo-run/src/input/http.rs (2)
125-126: Consider increasing the channel buffer size.The channel buffer is set to 1, which might cause backpressure if multiple model updates occur in quick succession. Consider using a larger buffer or an unbounded channel.
Apply this diff to increase the buffer size:
// Create a channel to receive model type updates - let (tx, mut rx) = tokio::sync::mpsc::channel(1); + let (tx, mut rx) = tokio::sync::mpsc::channel(32);
156-157: Consider logging when Backend model updates are received.Since Backend models don't trigger endpoint changes, it might be helpful to log when they're received to aid in debugging.
Apply this diff to add logging:
- ModelType::Backend => {} + ModelType::Backend => { + tracing::debug!("Backend model added - no endpoint changes required"); + }lib/llm/src/http/service/service_v2.rs (2)
145-182:enable_*helpers are triplicated – extract a common setter
The three nearly-identical methods differ only in the field that is toggled and the log message. Consolidating them into one generic helper (or macro) will reduce boilerplate and prevent future inconsistencies.Example sketch:
fn set_flag<F>(&self, f: F, flag_name: &str, enable: bool) where F: FnOnce(&mut StateFlags) -> &mut bool, { match self.state.flags.write() { Ok(mut flags) => { *f(&mut *flags) = enable; tracing::info!("{flag_name} {}", if enable { "enabled" } else { "disabled" }); } Err(e) => tracing::error!("Failed to acquire write lock for {flag_name}: {e}"), } }
214-292: Different HTTP codes for the same “disabled” condition
Chat returns503 Service Unavailable, while completions/embeddings return404 Not Found. The difference is confusing for clients toggling models dynamically. Pick one status (typically503with aRetry-Afterheader) for all disabled endpoints, or document the rationale.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
launch/dynamo-run/src/input/http.rs(5 hunks)lib/llm/src/discovery.rs(1 hunks)lib/llm/src/discovery/watcher.rs(5 hunks)lib/llm/src/http/service/service_v2.rs(6 hunks)
🔇 Additional comments (3)
lib/llm/src/discovery.rs (1)
10-11: LGTM! Clean re-export of the newModelUpdateenum.The change appropriately expands the public API to include the
ModelUpdateenum alongsideModelWatcher, enabling external modules to subscribe to model lifecycle events.launch/dynamo-run/src/input/http.rs (1)
65-68: 💡 Verification agent🧩 Analysis chain
Verify that chat completion endpoints include both
/v1/chat/completionsand/chat/completions.The code enables chat endpoints after adding models. Ensure that both OpenAI-compatible and legacy endpoints are properly enabled.
🏁 Script executed:
#!/bin/bash # Description: Check the implementation of enable_chat_endpoints to verify it enables all expected routes # Search for the enable_chat_endpoints implementation ast-grep --pattern 'fn enable_chat_endpoints($$$) { $$$ }'Length of output: 62
Let’s locate the actual implementation to confirm which routes are registered:
🏁 Script executed:
#!/bin/bash # Search for the enable_chat_endpoints method and show surrounding lines rg -n "enable_chat_endpoints" -C5 --type rsLength of output: 70
Let’s try locating the function definition in all Rust files without restricting by type:
🏁 Script executed:
#!/bin/bash # Find the enable_chat_endpoints function in any .rs file rg -n "fn enable_chat_endpoints" -C5 -g '*.rs'Length of output: 964
Let’s inspect where the flag is used and whether both routes are registered:
🏁 Script executed:
#!/bin/bash # Locate flag usage in service_v2.rs rg -n 'chat_endpoints_enabled' -C10 lib/llm/src/http/service/service_v2.rs # Search for explicit registration of each endpoint rg -n '/v1/chat/completions' -C5 -g '*.rs' rg -n '/chat/completions' -C5 -g '*.rs'Length of output: 16955
Add legacy
/chat/completionsroute alongside/v1/chat/completionsThe current build registers only the default
/v1/chat/completionsendpoint. To maintain backward compatibility, you should also register the legacy/chat/completionspath.Locations to update:
- In
lib/llm/src/http/service/service_v2.rsinsideHttpServiceConfigBuilder::build(around line 213): after you configurechat_route, invokechat_completions_routeragain withSome("/chat/completions".into())and merge it into your router.- Optionally, in
lib/llm/src/http/service/openai.rsyou could updatechat_completions_routerso that whenpathisNoneit registers both paths automatically.Example diff in
service_v2.rs:@@ // Add chat completions route with conditional middleware - let (chat_docs, chat_route) = - super::openai::chat_completions_router(state.clone(), config.request_template, None); + let (chat_docs, chat_route) = + super::openai::chat_completions_router(state.clone(), config.request_template, None); let chat_route = chat_route.route_layer(axum::middleware::from_fn(...)); + // Support legacy endpoint + let (_legacy_docs, legacy_route) = + super::openai::chat_completions_router( + state.clone(), + config.request_template, + Some("/chat/completions".into()), + ); + let legacy_route = legacy_route.route_layer(axum::middleware::from_fn(...)); + + // Merge both routers + router = router + .merge(chat_route) + .merge(legacy_route);Likely an incorrect or invalid review comment.
lib/llm/src/http/service/service_v2.rs (1)
193-197: Lock-poisoning is silently swallowed
If a previous panic poisoned theflagslock, the write attempt fails and the configuration silently remains stale. Consider propagating the error or at least returning aResultso callers can react instead of continuing in an undefined state.
c680058 to
11c31be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
launch/dynamo-run/src/input/http.rs (2)
126-137:⚠️ Potential issueCritical issue: Task handle not stored for proper lifecycle management.
The spawned task handle is not stored, preventing proper cleanup and potentially causing the task to run indefinitely.
Apply this diff to store the task handle:
- // Spawn a task to watch for model type changes and update HTTP service endpoints - let _endpoint_enabler_task = tokio::spawn(async move { + // Spawn a task to watch for model type changes and update HTTP service endpoints + let endpoint_enabler_task = tokio::spawn(async move { while let Some(model_type) = rx.recv().await { tracing::debug!("Received model type update: {:?}", model_type); update_http_endpoints(http_service.clone(), model_type); } });Consider returning the task handle from this function or storing it in a struct for proper lifecycle management.
158-158:⚠️ Potential issueMissing functionality: Backend model type should enable both chat and completion endpoints.
The
ModelType::Backendcase is empty but should enable both chat and completion endpoints since the watcher registers both engines for Backend models.Apply this diff to handle Backend models correctly:
ModelType::Backend => {} + ModelType::Backend => { + service.enable_chat_endpoints(true); + service.enable_cmpl_endpoints(true); + } }, ModelUpdate::Removed(model_type) => match model_type { ModelType::Chat => service.enable_chat_endpoints(false), ModelType::Completion => service.enable_cmpl_endpoints(false), ModelType::Embedding => service.enable_embeddings_endpoints(false), - ModelType::Backend => {} + ModelType::Backend => { + service.enable_chat_endpoints(false); + service.enable_cmpl_endpoints(false); + }Also applies to: 164-164
lib/llm/src/http/service/service_v2.rs (1)
4-4: 🛠️ Refactor suggestionUse async-friendly RwLock to avoid blocking the Tokio runtime.
The middleware functions acquire read locks on every request, and
std::sync::RwLockcan block the entire executor thread under contention, hurting performance.Switch to
tokio::sync::RwLock:-use std::sync::{Arc, RwLock}; +use std::sync::Arc; +use tokio::sync::RwLock;Update all lock operations to use
.await:
state.flags.read()→state.flags.read().awaitstate.flags.write()→state.flags.write().await
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
launch/dynamo-run/src/input/http.rs(5 hunks)lib/llm/src/discovery.rs(1 hunks)lib/llm/src/discovery/watcher.rs(5 hunks)lib/llm/src/http/service/service_v2.rs(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/llm/src/discovery.rs
- lib/llm/src/discovery/watcher.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
launch/dynamo-run/src/input/http.rs (6)
10-14: LGTM: Clean import additions for dynamic endpoint management.The new imports for
ModelUpdate,HttpService, andModelTypeare necessary for the dynamic endpoint functionality and properly organized.
51-51: LGTM: Proper Arc wrapping for shared HTTP service.The
Arc::new(http_service.clone())correctly wraps the HttpService for shared ownership across async tasks.
65-69: LGTM: Appropriate static endpoint enabling placement.Moving the endpoint enabling after model registration ensures the endpoints are available when the models are ready to serve requests.
90-94: LGTM: Consistent endpoint enabling for StaticCore configuration.The endpoint enabling pattern is consistently applied across both StaticFull and StaticCore configurations.
119-119: LGTM: Function signature updated for dynamic endpoint management.The addition of the
http_service: Arc<HttpService>parameter enables the watcher to dynamically control HTTP endpoints.
147-167: LGTM: Well-structured endpoint update function with proper logging.The function correctly matches on
ModelUpdatevariants and toggles the appropriate endpoints. The debug logging provides good visibility into the dynamic endpoint management.lib/llm/src/http/service/service_v2.rs (7)
25-30: LGTM: Clean state flags structure for dynamic endpoint control.The
StateFlagsstruct provides a clear and organized way to track endpoint enablement state with appropriate defaults.
37-41: LGTM: Consistent initialization of state flags.The flags are properly initialized to disabled state, aligning with the new dynamic enablement approach.
86-89: LGTM: Appropriate default value changes for dynamic endpoint management.Changing defaults to
falsefor chat and completion endpoints aligns with the new dynamic enablement model where endpoints are enabled based on available models.
145-156: LGTM: Well-implemented endpoint toggle method with proper error handling.The
enable_chat_endpointsmethod correctly uses write locks and provides appropriate logging for both success and failure cases.
158-169: LGTM: Consistent implementation for completion endpoint control.The
enable_cmpl_endpointsmethod follows the same pattern as the chat method with proper locking and logging.
171-182: LGTM: Complete endpoint control API with consistent error handling.The
enable_embeddings_endpointsmethod completes the endpoint control API with the same reliable pattern.
192-197: LGTM: Proper initialization of runtime flags from configuration.The initial flag setting correctly transfers configuration values to the runtime state flags during service construction.
11c31be to
baa043c
Compare
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
9955d3c to
adfdb4e
Compare
|
@grahamking Can you please take a look at this PR? |
|
@oandreeva-nv I believe you have been thinking about this too. Could you take a look? |
|
@abrarshivani This is really great, thanks! Apologies I took so long, I was on vacation. Would you be able to rebase it to resolve the conflict? It comes from this landing, so hopefully quite small: #2011 If you're busy are you OK with me resolving the conflict? Will merge as soon as we can. |
|
@grahamking No worries, and thanks for the review. I'm occupied this week but will rebase and resolve the conflict by the end of the week. If you have the bandwidth before then, feel free to resolve the conflict. |
12ac3ce to
9383500
Compare
a2e4c55 to
7385205
Compare
|
Thank you again @abrarshivani ! Great stuff as usual. |
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
| && self.manager.list_chat_completions_models().is_empty() | ||
| { | ||
| self.manager.remove_chat_completions_model(&model_name).ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abrarshivani sorry for commenting on old PR, can you help me understand the conditional here? remove_chat_completions_model will be run if list_chat_completions_models().is_empty(), but in that case, wouldn't the remove do nothing as the manger already has no chat completions models?
Overview:
This PR enables dynamic management of HTTP endpoints (chat, completion, embeddings) based on model availability.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Refactor