Skip to content

Conversation

@abrarshivani
Copy link
Contributor

@abrarshivani abrarshivani commented Jun 9, 2025

Overview:

This PR enables dynamic management of HTTP endpoints (chat, completion, embeddings) based on model availability.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • HTTP service endpoints for chat, completion, and embeddings can now be dynamically enabled or disabled at runtime based on model updates.
    • Endpoints automatically adjust their availability as models are added or removed, providing a more responsive and adaptive user experience.
  • Bug Fixes

    • Improved error handling for endpoint availability, returning appropriate HTTP status codes when endpoints are disabled.
  • Refactor

    • Endpoint availability is now managed dynamically through runtime flags and middleware, replacing previous static configuration.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

github-actions bot commented Jun 9, 2025

👋 Hi abrarshivani! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Jun 9, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Walkthrough

The changes implement dynamic HTTP endpoint control based on model lifecycle events. Model addition or removal triggers asynchronous notifications, which update the availability of chat, completion, and embedding endpoints in the HTTP service at runtime. This is achieved through new runtime flags and middleware, replacing previous static configuration and route inclusion.

Changes

File(s) Change Summary
launch/dynamo-run/src/input/http.rs Refactored HTTP service setup to use dynamic endpoint enabling/disabling via model watcher notifications. Added helper for endpoint toggling.
lib/llm/src/discovery/watcher.rs Added ModelUpdate enum, notification channel to ModelWatcher, and logic to send updates on model add/remove.
lib/llm/src/discovery.rs Re-exported ModelUpdate alongside ModelWatcher.
lib/llm/src/http/service/service_v2.rs Added runtime flags and methods for toggling endpoint availability. Switched to middleware-based gating for endpoint routes.

Sequence Diagram(s)

sequenceDiagram
    participant ModelWatcher
    participant HttpService
    participant Client

    ModelWatcher->>HttpService: Send ModelUpdate (Added/Removed)
    HttpService->>HttpService: update_http_endpoints()
    Client->>HttpService: HTTP request to endpoint
    HttpService->>HttpService: Check endpoint flag (middleware)
    alt Endpoint enabled
        HttpService->>Client: Process request normally
    else Endpoint disabled
        HttpService->>Client: Return 503/404 error
    end
Loading

Poem

In the meadow of code where models appear,
Endpoints now dance—enabled by cheer!
A watcher on duty, flags in its paw,
Turns routes on and off with a hop and a jaw.
Dynamic and nimble, the service now sings,
As models come and go—see what change brings!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
lib/llm/src/http/service/service_v2.rs (1)

307-311: ⚠️ Potential issue

Build-breaking bug: with_request_template nests Option
request_template already has type Option<RequestTemplate>, but the setter wraps it in Some(_), yielding Option<Option<_>> and causing a type mismatch.

-pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self {
-    self.request_template = Some(request_template);
+pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self {
+    self.request_template = request_template;
     self
 }
🧹 Nitpick comments (7)
lib/llm/src/discovery/watcher.rs (3)

35-39: Add documentation for the public ModelUpdate enum.

This enum is part of the public API and should be documented to help users understand its purpose and usage.

Apply this diff to add documentation:

+/// Represents a model lifecycle event for dynamic endpoint management.
+///
+/// This enum is used to notify external components about model additions
+/// and removals, enabling dynamic control of HTTP endpoints based on
+/// available model types.
 #[derive(Debug, Clone, Copy, PartialEq)]
 pub enum ModelUpdate {
+    /// A model of the specified type has been added to the system.
     Added(ModelType),
+    /// A model of the specified type has been removed from the system.
     Removed(ModelType),
 }

67-69: Add documentation for the public method.

This method is part of the public API and should be documented.

Apply this diff to add documentation:

+    /// Sets the channel sender for model update notifications.
+    ///
+    /// When set, the watcher will send `ModelUpdate` events through this channel
+    /// whenever models are added or removed from the system.
     pub fn set_notify_on_model_update(&mut self, tx: Sender<ModelUpdate>) {
         self.model_update_tx = Some(tx);
     }

120-124: Consider logging when channel send fails.

The .ok() silently ignores send errors, which could occur if the receiver is dropped. Consider logging these failures for better observability.

Apply this diff to log send failures:

             if let Some(tx) = &self.model_update_tx {
-                tx.send(ModelUpdate::Added(model_entry.model_type))
-                    .await
-                    .ok();
+                if let Err(e) = tx.send(ModelUpdate::Added(model_entry.model_type)).await {
+                    tracing::warn!("Failed to send model update notification: {}", e);
+                }
             }
launch/dynamo-run/src/input/http.rs (2)

125-126: Consider increasing the channel buffer size.

The channel buffer is set to 1, which might cause backpressure if multiple model updates occur in quick succession. Consider using a larger buffer or an unbounded channel.

Apply this diff to increase the buffer size:

     // Create a channel to receive model type updates
-    let (tx, mut rx) = tokio::sync::mpsc::channel(1);
+    let (tx, mut rx) = tokio::sync::mpsc::channel(32);

156-157: Consider logging when Backend model updates are received.

Since Backend models don't trigger endpoint changes, it might be helpful to log when they're received to aid in debugging.

Apply this diff to add logging:

-            ModelType::Backend => {}
+            ModelType::Backend => {
+                tracing::debug!("Backend model added - no endpoint changes required");
+            }
lib/llm/src/http/service/service_v2.rs (2)

145-182: enable_* helpers are triplicated – extract a common setter
The three nearly-identical methods differ only in the field that is toggled and the log message. Consolidating them into one generic helper (or macro) will reduce boilerplate and prevent future inconsistencies.

Example sketch:

fn set_flag<F>(&self, f: F, flag_name: &str, enable: bool)
where
    F: FnOnce(&mut StateFlags) -> &mut bool,
{
    match self.state.flags.write() {
        Ok(mut flags) => {
            *f(&mut *flags) = enable;
            tracing::info!("{flag_name} {}", if enable { "enabled" } else { "disabled" });
        }
        Err(e) => tracing::error!("Failed to acquire write lock for {flag_name}: {e}"),
    }
}

214-292: Different HTTP codes for the same “disabled” condition
Chat returns 503 Service Unavailable, while completions/embeddings return 404 Not Found. The difference is confusing for clients toggling models dynamically. Pick one status (typically 503 with a Retry-After header) for all disabled endpoints, or document the rationale.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d49970 and c680058.

📒 Files selected for processing (4)
  • launch/dynamo-run/src/input/http.rs (5 hunks)
  • lib/llm/src/discovery.rs (1 hunks)
  • lib/llm/src/discovery/watcher.rs (5 hunks)
  • lib/llm/src/http/service/service_v2.rs (6 hunks)
🔇 Additional comments (3)
lib/llm/src/discovery.rs (1)

10-11: LGTM! Clean re-export of the new ModelUpdate enum.

The change appropriately expands the public API to include the ModelUpdate enum alongside ModelWatcher, enabling external modules to subscribe to model lifecycle events.

launch/dynamo-run/src/input/http.rs (1)

65-68: 💡 Verification agent

🧩 Analysis chain

Verify that chat completion endpoints include both /v1/chat/completions and /chat/completions.

The code enables chat endpoints after adding models. Ensure that both OpenAI-compatible and legacy endpoints are properly enabled.


🏁 Script executed:

#!/bin/bash
# Description: Check the implementation of enable_chat_endpoints to verify it enables all expected routes

# Search for the enable_chat_endpoints implementation
ast-grep --pattern 'fn enable_chat_endpoints($$$) {
  $$$
}'

Length of output: 62


Let’s locate the actual implementation to confirm which routes are registered:


🏁 Script executed:

#!/bin/bash
# Search for the enable_chat_endpoints method and show surrounding lines
rg -n "enable_chat_endpoints" -C5 --type rs

Length of output: 70


Let’s try locating the function definition in all Rust files without restricting by type:


🏁 Script executed:

#!/bin/bash
# Find the enable_chat_endpoints function in any .rs file
rg -n "fn enable_chat_endpoints" -C5 -g '*.rs'

Length of output: 964


Let’s inspect where the flag is used and whether both routes are registered:


🏁 Script executed:

#!/bin/bash
# Locate flag usage in service_v2.rs
rg -n 'chat_endpoints_enabled' -C10 lib/llm/src/http/service/service_v2.rs

# Search for explicit registration of each endpoint
rg -n '/v1/chat/completions' -C5 -g '*.rs'
rg -n '/chat/completions' -C5 -g '*.rs'

Length of output: 16955


Add legacy /chat/completions route alongside /v1/chat/completions

The current build registers only the default /v1/chat/completions endpoint. To maintain backward compatibility, you should also register the legacy /chat/completions path.

Locations to update:

  • In lib/llm/src/http/service/service_v2.rs inside HttpServiceConfigBuilder::build (around line 213): after you configure chat_route, invoke chat_completions_router again with Some("/chat/completions".into()) and merge it into your router.
  • Optionally, in lib/llm/src/http/service/openai.rs you could update chat_completions_router so that when path is None it registers both paths automatically.

Example diff in service_v2.rs:

@@ // Add chat completions route with conditional middleware
- let (chat_docs, chat_route) =
-     super::openai::chat_completions_router(state.clone(), config.request_template, None);
+ let (chat_docs, chat_route) =
+     super::openai::chat_completions_router(state.clone(), config.request_template, None);
  let chat_route = chat_route.route_layer(axum::middleware::from_fn(...));

+ // Support legacy endpoint
+ let (_legacy_docs, legacy_route) =
+     super::openai::chat_completions_router(
+         state.clone(),
+         config.request_template,
+         Some("/chat/completions".into()),
+     );
+ let legacy_route = legacy_route.route_layer(axum::middleware::from_fn(...));
+
+ // Merge both routers
+ router = router
+     .merge(chat_route)
+     .merge(legacy_route);

Likely an incorrect or invalid review comment.

lib/llm/src/http/service/service_v2.rs (1)

193-197: Lock-poisoning is silently swallowed
If a previous panic poisoned the flags lock, the write attempt fails and the configuration silently remains stale. Consider propagating the error or at least returning a Result so callers can react instead of continuing in an undefined state.

@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from c680058 to 11c31be Compare June 9, 2025 21:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
launch/dynamo-run/src/input/http.rs (2)

126-137: ⚠️ Potential issue

Critical issue: Task handle not stored for proper lifecycle management.

The spawned task handle is not stored, preventing proper cleanup and potentially causing the task to run indefinitely.

Apply this diff to store the task handle:

-    // Spawn a task to watch for model type changes and update HTTP service endpoints
-    let _endpoint_enabler_task = tokio::spawn(async move {
+    // Spawn a task to watch for model type changes and update HTTP service endpoints
+    let endpoint_enabler_task = tokio::spawn(async move {
         while let Some(model_type) = rx.recv().await {
             tracing::debug!("Received model type update: {:?}", model_type);
             update_http_endpoints(http_service.clone(), model_type);
         }
     });

Consider returning the task handle from this function or storing it in a struct for proper lifecycle management.


158-158: ⚠️ Potential issue

Missing functionality: Backend model type should enable both chat and completion endpoints.

The ModelType::Backend case is empty but should enable both chat and completion endpoints since the watcher registers both engines for Backend models.

Apply this diff to handle Backend models correctly:

            ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(true);
+               service.enable_cmpl_endpoints(true);
+           }
        },
        ModelUpdate::Removed(model_type) => match model_type {
            ModelType::Chat => service.enable_chat_endpoints(false),
            ModelType::Completion => service.enable_cmpl_endpoints(false),
            ModelType::Embedding => service.enable_embeddings_endpoints(false),
-           ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(false);
+               service.enable_cmpl_endpoints(false);
+           }

Also applies to: 164-164

lib/llm/src/http/service/service_v2.rs (1)

4-4: 🛠️ Refactor suggestion

Use async-friendly RwLock to avoid blocking the Tokio runtime.

The middleware functions acquire read locks on every request, and std::sync::RwLock can block the entire executor thread under contention, hurting performance.

Switch to tokio::sync::RwLock:

-use std::sync::{Arc, RwLock};
+use std::sync::Arc;
+use tokio::sync::RwLock;

Update all lock operations to use .await:

  • state.flags.read()state.flags.read().await
  • state.flags.write()state.flags.write().await
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c680058 and 11c31be.

📒 Files selected for processing (4)
  • launch/dynamo-run/src/input/http.rs (5 hunks)
  • lib/llm/src/discovery.rs (1 hunks)
  • lib/llm/src/discovery/watcher.rs (5 hunks)
  • lib/llm/src/http/service/service_v2.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/llm/src/discovery.rs
  • lib/llm/src/discovery/watcher.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
launch/dynamo-run/src/input/http.rs (6)

10-14: LGTM: Clean import additions for dynamic endpoint management.

The new imports for ModelUpdate, HttpService, and ModelType are necessary for the dynamic endpoint functionality and properly organized.


51-51: LGTM: Proper Arc wrapping for shared HTTP service.

The Arc::new(http_service.clone()) correctly wraps the HttpService for shared ownership across async tasks.


65-69: LGTM: Appropriate static endpoint enabling placement.

Moving the endpoint enabling after model registration ensures the endpoints are available when the models are ready to serve requests.


90-94: LGTM: Consistent endpoint enabling for StaticCore configuration.

The endpoint enabling pattern is consistently applied across both StaticFull and StaticCore configurations.


119-119: LGTM: Function signature updated for dynamic endpoint management.

The addition of the http_service: Arc<HttpService> parameter enables the watcher to dynamically control HTTP endpoints.


147-167: LGTM: Well-structured endpoint update function with proper logging.

The function correctly matches on ModelUpdate variants and toggles the appropriate endpoints. The debug logging provides good visibility into the dynamic endpoint management.

lib/llm/src/http/service/service_v2.rs (7)

25-30: LGTM: Clean state flags structure for dynamic endpoint control.

The StateFlags struct provides a clear and organized way to track endpoint enablement state with appropriate defaults.


37-41: LGTM: Consistent initialization of state flags.

The flags are properly initialized to disabled state, aligning with the new dynamic enablement approach.


86-89: LGTM: Appropriate default value changes for dynamic endpoint management.

Changing defaults to false for chat and completion endpoints aligns with the new dynamic enablement model where endpoints are enabled based on available models.


145-156: LGTM: Well-implemented endpoint toggle method with proper error handling.

The enable_chat_endpoints method correctly uses write locks and provides appropriate logging for both success and failure cases.


158-169: LGTM: Consistent implementation for completion endpoint control.

The enable_cmpl_endpoints method follows the same pattern as the chat method with proper locking and logging.


171-182: LGTM: Complete endpoint control API with consistent error handling.

The enable_embeddings_endpoints method completes the endpoint control API with the same reliable pattern.


192-197: LGTM: Proper initialization of runtime flags from configuration.

The initial flag setting correctly transfers configuration values to the runtime state flags during service construction.

@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from 11c31be to baa043c Compare June 21, 2025 09:01
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jul 22, 2025
@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from 9955d3c to adfdb4e Compare July 26, 2025 06:28
@abrarshivani abrarshivani requested a review from a team as a code owner July 26, 2025 06:28
@github-actions github-actions bot removed the Stale label Jul 29, 2025
@abrarshivani
Copy link
Contributor Author

@grahamking Can you please take a look at this PR?

@grahamking grahamking added the backlog Issues or bugs that will be tracked for long term fixes label Aug 4, 2025
@grahamking
Copy link
Contributor

@oandreeva-nv I believe you have been thinking about this too. Could you take a look?

@grahamking
Copy link
Contributor

@abrarshivani This is really great, thanks! Apologies I took so long, I was on vacation.

Would you be able to rebase it to resolve the conflict? It comes from this landing, so hopefully quite small: #2011

If you're busy are you OK with me resolving the conflict? Will merge as soon as we can.

@abrarshivani
Copy link
Contributor Author

@grahamking No worries, and thanks for the review. I'm occupied this week but will rebase and resolve the conflict by the end of the week. If you have the bandwidth before then, feel free to resolve the conflict.

@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from 12ac3ce to 9383500 Compare August 8, 2025 18:59
@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from a2e4c55 to 7385205 Compare August 8, 2025 19:10
@grahamking grahamking self-requested a review August 15, 2025 16:49
@grahamking grahamking merged commit 537759f into ai-dynamo:main Aug 15, 2025
10 checks passed
@grahamking
Copy link
Contributor

Thank you again @abrarshivani ! Great stuff as usual.

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Comment on lines +179 to +181
&& self.manager.list_chat_completions_models().is_empty()
{
self.manager.remove_chat_completions_model(&model_name).ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrarshivani sorry for commenting on old PR, can you help me understand the conditional here? remove_chat_completions_model will be run if list_chat_completions_models().is_empty(), but in that case, wouldn't the remove do nothing as the manger already has no chat completions models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backlog Issues or bugs that will be tracked for long term fixes external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants